Reverse ETL: what it is and why it works so well
This emergent data-syncing method is increasing in popularity, but what exactly does it entail – and what’s so attractive about it when comparable processes are already available?
Data professionals are perhaps rivalled only by members of the armed forces in their fondness for acronym-laced jargon. The industry is swimming in an alphabet soup of TLAs (three-letter abbreviations) that serve to baffle outsiders. And now there’s a new term to add to the lexicon: it’s called reverse ETL (extract, transform, load).
It’s an important concept for anyone with a role in managing information. In a nutshell, it describes the process of transferring data back to the systems from which it originated.
In the long-established traditional ETL process, a company gathers data from across the enterprise and stores it all together in a data warehouse. Having everything collated in a central repository is great for analytics purposes, but there is a disadvantage: the material is stuck in one place. The various applications from which the data was obtained, such as customer relationship management (CRM) systems, are likely to be out of sync with each other. They may contain obsolete and/or incomplete information, which makes life difficult for the front-line users of those systems.
In this way, the data warehouse – created to prevent silos – becomes the biggest silo of all. It means that the ELT process, which helped to move data generated by several applications to the warehouse, needs to be reversed to return material to these applications.
“Reverse ETL means using your data warehouse as a hub,” says Rob Jones, CEO at data services provider Qbase. “This allows you to combine all your incoming data feeds into a single version of the truth, apply enhancements and share these ‘golden’ records back to the original data sources. This will make the data in all your applications more accurate and complete. This synchronisation is a key principle in master data management.”
Sales and marketing applications – Salesforce, HubSpot and Tableau, for instance – are the most cited examples of reverse ETL destinations.
Jones explains that one of the main reasons for engaging in reverse ETL is to provide front-line staff with enriched data. “For example, a salesperson using a CRM application will be able to offer a given customer far better service if they can see not only the data in the CRM, but also information about the customer that has been generated elsewhere – that person’s browsing behaviour on the company’s website, for instance. This insight helps them to form a better understanding of what the customer is really interested in, enabling them to have a more tailored and effective conversation with them during their next call.”
Why don’t these applications simply access data from the warehouse whenever it’s needed?
Darren Timmins, chief technology officer of analytics provider Intuita, says there are four main reasons why it’s worth going to the trouble of using reverse ETL. The first concerns the prevention of bottlenecks. Source applications are built to handle many simultaneous enquiries on small data sets. Examples of this include sales reps running CRM requests. Data warehouses, by contrast, are built to run large-scale enquiries submitted by a smaller number of users. Reverse ETL enables each system to play to its strength.
The second reason centres on resilience. Relying on a central repository to answer all queries would render the system vulnerable to total failure in the event of an outage at the warehouse. Pushing data back to the applications enables them to operate autonomously even if the warehouse is down.
Third is the matter of cost control. Data warehouse technologies are powerful, but they come with big overheads. Reverse ETL allows data to be queried on local, nimble and low-cost infrastructure.
And last, but not least, comes network security. Timmins observes that “pushing only the minimal amount of data required to a system reduces the footprint of data that could be viewed by a third party if part of the system were to be compromised”.
Executing a reverse ETL is made easier by the existence of specialist tools. Among the most popular of these are Census, Hightouch, Grouparoo and Seekwell.
But is it the best approach? There are sceptics. Mark Sheldon, chief technology officer at Sidetrade, an artificial intelligence platform for finance teams, believes that the whole concept can be made redundant by syncing data in a more obvious way.
“Reverse ETL was dead before it was born,” he argues. “Why companies are adding another layer of complexity and tooling to interface with systems of execution is beyond me.”
The application programming interface (API) is usually sufficient, he explains, adding: “With reverse ETL, in essence all we want to do is push insights to their final destination – and this process should be as simple as possible. Every front-line system on the planet has an API designed for exactly this purpose, yet we seem reluctant to use them. My advice to any chief technology officer is: don’t concentrate too much on reverse ETL. Keep integrations simple, but focus everything on the execution by tracking a success metric and improving iteratively over time.”
Other critics note that a data warehouse can be optimised to run operational analytics without pushing the data back to applications. But advocates of reverse ETL argue that is a ‘cleaner’ approach because the material is kept in a single location, thereby reducing the risk that errors will creep in. Moreover, it can even be made cheaper, by using open-source distributed SQL databases rather than expensive legacy providers.
In any case, such esoteric arguments are likely to be of limited interest outside the profession. End users simply want the most current and complete set of records possible. To them, questions about whether they should obtain this material through API access, reverse ETL or querying the warehouse each time are irrelevant.
Nonetheless, reverse ETL is a nice option to have. It means that data can be collated and crunched in one place using heavy-duty analytics and also fed back into front-line applications to equip sales staff with all the information they need. For many organisations, the process may be the best, and maybe the only, way to achieve this.