Data lakehouses are becoming increasingly popular in business as leaders seek to eke out efficiencies from their operations and unlock new opportunities by analysing the data given to them every day. Some data leaders might have heard of the term, or seen competitors utilising the power of the data lakehouse in their business. But what exactly is it? What potential does it offer? And how can businesses ensure they get it right?
The arrival of the lakehouse is the natural evolution in the world of business data. As organisations began to integrate more and more data into their daily processes, the need for the so-called data warehouse emerged. These are management systems that organise data in a way that makes it easy to examine. But data sometimes defies categorisation: it can be too messy or haphazard to fit into the warehouse structure, stacked on metaphorical shelves and in rows. And while the solution to that may seem to be the flexibility that the data lake offers, challenges with governance and structure can turn a lake into a data swamp if organisations aren’t careful.
Enter the data lakehouse, combining the flexibility, cost-efficiency and scale of data lakes with the data management capabilities of the warehouse. Pools of unorganised data can be analysed where they are, negating the need to try to put it into boxes into which it doesn’t really fit.
“The lakehouse is an architectural paradigm and effectively a standard we’re trying to form in the market,” says Dael Williamson, EMEA field CTO at Databricks. In many ways, it doesn’t matter what exactly the data is: it can still be analysed through the data lakehouse. “It’s about ‘how do I organise my data in the most rational, standardised and efficient way to be able to streamline production and distribution of any form of data?’,” adds Williamson.
Putting the lakehouse to work
Databricks helps customers, including Condé Nast, H&M, Gousto, La Liga, and over 40% of the Fortune 500, to unify their data, analytics and AI using its own data lakehouse platform. Among them is Fastned, a superfast electric vehicle charging company, which overhauled the infrastructure behind its business to embrace the data lakehouse. “As we scaled, we wanted to get our hands on streaming data from chargers for near real-time insights, but were unable to deliver on that with our old infrastructure,” says Bruna Maia, data and insights manager at Fastned.
But that changed when they instigated a data lakehouse platform that enabled them to better utilise the reams of data they had access to. “We have been able to structure streaming pipelines and create better standards regarding our data engineering,” says Maia. “We can now ensure network uptime at a higher scale.”
It’s not just traditional businesses that can benefit from utilising a data lakehouse in their operations. During the coronavirus pandemic, lakehouses came into their own, helping collate information that helped formulate responses to rising case numbers, and developing plans for how to tackle it. “Covid tests or the covid vaccine are a great example of where we had to take big sets of data and pull them together from disparate organisations,” says Robin Sutara, field CTO at Databricks.
The healthcare sector’s response to coronavirus is an ideal example of the opportunities afforded and enabled by the data lakehouse, says Sutara. “The lakehouse empowers data sharing across organisations that traditionally have not shared their data before, and the value that you can drive for society as a result, when you unlock the power of those datasets,” she says. It’s a model that could be carried forward to cross-sector collaboration in other fields, for the benefit not just of the businesses involved, but society as a whole.
Smart utilisation of data lakehouses can help businesses run more efficiently, too – something that’s vital given the economic challenges facing organisations in all areas and of all sizes at present. “I think every organisation is really starting to think about what the rising cost of energy, food shortages and other [challenges] mean for them,” says Sutara. “How do they make sure that they’re using their data as efficiently and as effectively as possible to ensure that they’re driving the best value that they can for their consumers and their customers?”
Shipping and transport companies have also benefitted enormously from the power of data lakehouses. Around 20 companies are involved in forwarding a single freight cargo from one side to another. “There’s the container, there’s the actual ship itself, there’s insurance, there’s the broker, there’s the whole logistics play. And that can often be a very, very complicated piece of work,” says Williamson. Introducing friction to that process, which often includes the handling and understanding of unstructured data, can be disastrous, which is why lakehouses can add real value. Companies are able to set up their systems to pull relevant data from the lakehouse for their chosen area that’s needed, without worrying about the other sections.
Identifying new ways to innovate
But it’s not just about driving efficiencies or continuing with business as usual where data lakehouses come into their own. “I always appreciate the societal impact you can have when you unlock the power of your data,” says Sutara. Pulling data into a lakehouse allows organisations to make linkages they previously may not have considered.
It’s possible to invent new business areas or identify potential sections for business growth simply by pooling data in one place and interrogating what it’s saying without constraints. It can identify patterns where previously there appeared to be none; it can boost the bottom line in ways that hadn’t been considered until that point. It provides endless potential and opportunities that wouldn’t otherwise be identified.
Nor is that limited to the world of business. “We haven’t even thought about the innovation, the capability, and the impact it can have on humans and the Earth and all of those things that we want to make better for future generations,” says Sutara. “I just think it’s amazing what we’re going to be able to do to deal with it, once we have our arms around it.”
To find out more, visit databricks.com