What is ELT?
ELT stands for “Extract, Load, Transform.” It is the cloud-warehouse era approach to sequencing of the processes needed to build a data warehouse, and it is integral to the “Modern Data Stack (MDS).”
Extract
Extract is the process by which data is replicated from a source system and temporarily stored so that the next step of the process can be executed. Replication is usually done by one of three methods:
- Direct connection to a database where the contents of the database are replicated. This can be done through a query to extract certain data or using logs to replicate all the changes to the database.
- Export of a flat-file (CSV, Excel/Google Sheet for new systems or fixed length, tab-delimited for older systems). The flat-file is usually generated by the source system and will be made available to the extraction platform.
- API (application programming interface), which usually comes in the REST standard and stored in either JSON or XML
Load
Load is the process by which the data that was extracted from the source system is loaded into the target database. As ELT is more often an approach used for building data stacks for analytical purposes, we often use bulk loading approaches as we load data periodically vs. continuously.
The target database will often be an OLAP (online analytical processing) database as this type of database is optimized for analytics. Some key notes around the load process:
- Database and flat-file data will often be the exact same format as the source system
- API may differ from the source system, as JSON and XML enable semi-structured data, which may not be supported in the target database and thus need to be converted to a structured format
Transform
Transform is the process of taking loaded data and applying business logic to that data. Transform is often executed via code that is written in SQL or Python, which enables a business to implement complex logic to manipulate or combine the data and results in new tables that are better suited to solving challenges at hand.