Data democratisation is the goal for a number of organisations, but there are many challenges standing in the way. Data lakes – but not as we know them – can provide an answer
With data increasingly viewed as the lifeblood of any modern organisation, the idea of democratising that data is taking hold among business leaders.
Put simply, data democratisation places the power of data into employees’ hands, rather than keeping it hidden from view or restricted to a select few. When access to data is limited, its potential is greatly diminished. But democratisation can provide fast and valuable data-driven insights, often on the frontline, where it is needed most.
However, more than simply making more data available to more people, the concept often requires a rethink about how organisations manage, distribute and consume data. It can also involve widespread cultural change across the business.
According to Gartner, by 2023, data literacy will become an explicit and necessary driver of business value, demonstrated by its formal inclusion in over 80% of data and analytics strategies and change management programs*. However, there remain some roadblocks on the path to data literacy.
One is the obvious influx of data that organisations are facing. In 2020, 64.2 zettabytes of data was created or replicated. Moreover, the amount of digital data created over the next five years will be greater than twice the amount of data created since the advent of digital storage.
Accessing and analysing this data is central to an organisation’s innovation and agility – both critical in the current disruptive landscape. But alongside the sheer volumes of data, many enterprises must contend with disparate pockets of data, siloed across the organisation in applications and systems.
At the same time, current technology limitations are preventing easy and cost-effective access to that data.
In the era of big data, legacy systems had to be provisioned and configured, and the information entered into databases. This could take months and inhibits access to vital and time-sensitive data. And because of this cost and complexity associated with legacy technology, few organisations can afford true democratisation of information.
These are all gates that stand in the way of data democratisation, says Thomas Hazel, CTO & founder at ChaosSearch.
“These gates – whether time, cost or complexity – are preventing companies from being data literate,” says Hazel. “You can have hundreds of people in your organisation, and it takes weeks, months, and even years to stand up infrastructure to access the data. Data is the lifeblood of organisations, and when it takes weeks and months to get access to it, it’s a problem.
“You need to have a different philosophy as to how data is consumed, stored, managed and accessed.”
The ChaosSearch approach to moving, storing, organising and providing access to data quickly and efficiently is based on a data lake philosophy.
Hazel says the concept of a data lake is no longer bound by the time, complexity and cost restraints associated with some big data technologies from a few years ago. Cloud storage is the enabler of this lake philosophy.
Now, cloud object storage – as pioneered by Amazon Web Services (AWS) and serving as the foundation of all the cloud providers – is the simplest and most secure way to store data that can scale infinitely but cost effectively.
“A lake or cloud storage makes it so easy to stream data in. There’s no complexity of standing anything up. There’s no schema to structure it in a format, it can just be consumed. You just set it up and forget it,” says Hazel.
At the same time, he notes that many enterprises want a centralised way to identify what data they have and get access to it. The problem is, there aren’t solutions that can take
advantage of a lake philosophy.
ChaosSearch is a new way to represent information. Supporting multi-model data access methods, ChaosSearch makes data simultaneously available through Elastic, SQL and, in future, machine learning APIs. The ChaosSearch Data Lake Platform can connect to and index all data within a customer’s own cloud storage environment – making it fully searchable and immediately available for analysis with existing data tools.
“We have built the technology to remove those gates to democratise that information because those old architectures, by definition, can’t,” explains Hazel.
“This combination of innovation with a new architecture and a new philosophy means what used to take months, or maybe years, to build out at petabyte scale, now takes weeks or even maybe a day to stand up. That to me is a huge democratisation move.”
It is also important to acknowledge that the difference between the old and new way of approaching data lakes is that now they can be activated for generating everyday business value.
“A lot of cloud storage platforms are archiving data for security or compliance. If their siloed databases fall down, they send it to the lake. We had to tell customers that they could activate it as their primary analytical source,” says Hazel.
“We tell our customers: ‘Those use cases you’re running for log analytics or for security, or you’re using maybe Snowflake for business analysis, what if you had one platform with a lake philosophy that we activate to perform those use cases without changing a thing?’”
Additionally, because it plugs directly into enterprises’ existing cloud object storage, they can have role-based access control for different departments and users, with access rights granted immediately. There is no need for a data engineer to wrangle the data, a team of engineers to format it, or a database administrator to define schema and relational tables and provide back controls.
“You could self-serve, and once you can self-serve the data is democratised,” says Hazel. “You have to rethink how you manage information, as well as reinvent how to access the information.”
By 2022, IDC says 90% of corporate strategies will explicitly mention information as a critical enterprise asset and analytics as an essential competency. Removing any gates to data literacy will be imperative to an organisation’s agility and growth.
“If your business isn’t adopting this new philosophy, you’re in trouble,” says Hazel. “If you can’t access information, you’re going to be left out. The time is now. You can start sending your data to your lake, without having to make a choice, it’s just a philosophy choice.”
- Gartner, 10 Ways CDOs Can Succeed in Forging a Data-Driven Organization, Mike Rollings, Alan D. Duncan, Valerie Logan. Refreshed 15 October 2020. Published 22 May 2019. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the US and internationally and is used herein with permission
For more information please visit chaossearch.io