Cut through data chaos to prepare for effective generative AI

How can organisations get their house in order before they start experimenting with emerging technology?

Where you find an emerging technology, you will always find excitement – as people dream of its potential to drive real business value and growth. However, to harness the true potential of new technology, organisations need to ensure they have their data ducks in a row. 

This is particularly true of generative AI, where leaders need to pay as much attention to the data they need to make it work as the value it can bring to the organisation. Those who take the time to get this right are more likely to see vast benefits. According to AWS, 93% of chief data officers (CDOs) accept a data strategy is crucial to creating business value from generative AI, but despite this, 45% say they don’t have the right data foundation in place. 

Farhin Khan, head of UKI Databases at AWS, explains that data chaos makes it harder to solve generative AI risks later, such as ethics, bias and hallucinations. She says that the CDO needs to lead organisational change. 

“As a relatively new C-level position, the CDO role has evolved tremendously in the last decade,” Khan explains. “The key challenges and barriers faced are mostly organisational and behavioural, rather than technological, and related to culture, people and process within the organisation. 

As a relatively new C-level position, the CDO role has evolved tremendously in the last decade

“CDOs need to gauge their organisation’s preparedness for data initiatives and choose the most straightforward route to desired behaviours and business outcomes. Culture change is challenging to achieve and quantify, but it is a necessity for successful adoption of a data-driven approach.” 

Many other factors can contribute to an inadequate initial adoption of generative AI, not least budgets and skills. AWS research found that 55% of CDOs cited having “insufficient resources” to accomplish their goals while half spoke of a “lack of data literacy or understanding” in their organisation.

Khan advises that developing a “modern data strategy” can overcome these barriers. “This is an agile plan of aligned actions spanning mindset, people, process and technology that accelerates value creation,” she explains, “using data in direct support of strategic business objectives.” 

She adds: “In the past, organisations would create a comprehensive strategy document spanning three to five years, which often remained untouched and unread. In today’s dynamic digital landscape, a modern data strategy should be regularly updated to reflect evolving realities and rapid changes internally and externally.” 

Developing meaningful content 

A data strategy for generative AI adoption needs “data quality” at its core, Khan believes. “It forms the foundation for accurate learning, unbiased outputs and the generation of meaningful content,” Khan says, “ultimately contributing to the overall effectiveness and trustworthiness of generative AI applications.” 

Achieving data quality relies on several actions; these include effective data checks alongside leveraging purpose-built tools across all aspects of data pipelines and processes. Khan’s experience shows this is often not achieved due to a failure to work backwards from customer use cases or because speed has been prioritised over quality. 

Data must be both “relevant to the application and accurate/free from errors”, Khan says. She explains: “While noisy data can lead to poor model performance, inaccurate data will result in misleading outputs. 

“It is crucial to ensure any underlying data set is free from biases for fairness in your generative AI systems; inconsistent data can confuse the model and hinder its ability to learn patterns.”

Other barriers to data quality might include outdated data governance and management policies, a culture that discourages shared access to business-critical information and a lack of agility, Khan suggests. 

Therefore, C-suites can better support their CDOs to overcome these barriers by taking a “collaborative, hands-on approach” to foster change, she adds, with everyone, from top to bottom, “understanding the value of data and its role in decision-making”. 

Agreeing to the necessary budget at board level is also a critical moment in delivering a scalable data infrastructure. Khan believes the budget should include legacy system upgrades and cloud adoption. She adds: “C-suites should provide resources for budget and personnel for data projects, data literacy programmes and hiring so the CDO can build a skilled data team.” 

A step-by-step approach 

The journey towards adopting generative AI can begin with the smallest step, which can then be added to over time with incremental change. Key steps on this path include working backwards from customer challenges, automating as you go, and establishing your values and ethics guardrails, Khan says. 

She concedes there is no “one-size-fits-all” approach to a technology solution. Instead, a modern data architecture “giving you the best of data mesh, data lakes and purpose-built data stores” is the answer. 

“It lets you store any amount of data you need at a low cost, and in open, standards-based data formats,” she adds. “It isn’t restricted by data silos and lets you empower people to run analytics or machine learning using their preferred tool or technique. Also, it lets you securely manage who has access to the data.” 

Technology solutions are also evolving alongside generative AI developments. For example, Amazon Bedrock, AWS’s service for building and scaling generative AI applications, has recently announced its Guardrails feature to help customers implement safeguards customised to their generative AI applications. 

CDOs should have the authority and resources to establish and enforce data quality standards, security measures and compliance protocols

Building a generative AI model from scratch not only requires a large volume of high-quality data, but it also needs fine-tuning. A modern data strategy will consider the nuances and intricacies when training multi-modal models so the differences in data types – text, image, audio or video – are understood. 

“CDOs should have the authority and resources to establish and enforce data quality standards, security measures and compliance protocols. This not only ensures the reliability of the data but also mitigates potential risks,” Khan says. 

A focus on mindset, people, process and technology can also be used as a framework to avoid data chaos; this can be most efficient and successful when coupled with a C-suite approach to recognise, acknowledge and reward. 

Khan contends this culture of acknowledgement and reward is vital as successful data-driven initiatives must celebrate incremental wins. “By combining these elements, executives can provide the necessary support for CDOs to swiftly and efficiently establish the robust data foundations needed for harnessing generative AI’s transformative power,” she adds. 

Another aspect to be recognised is the distinct but complementary roles of data producers, data technology teams and data consumer teams. Doing so can create an agile environment that innovates faster while adhering to data security rules and regulatory considerations. 

“CDOs are most exposed right now as they face new challenges,” Khan cautions. “Integrating emerging technologies like generative AI into current data strategy initiatives and existing data ecosystems is a key focus area.” 

Getting strong data foundations in place may seem like a big project, but those organisations who take on the challenge will put themselves head and shoulders above the competition to leverage the power of generative AI.

For more information, please visit