Is synthetic data the key to better market research?

Not content with using AI to analyse feedback from consumer polls, market researchers hope it could eliminate human respondents from the process by generating the same replies they’d have given

A girl raises her hand at a market research event

Market-research companies have found AI to be a useful analytical tool, particularly its ability to understand what consumers write on questionnaires and say in audio or video interviews. The technology can also reliably interpret their answers to reveal hidden insights. It can even suggest next steps.

But in the next wave of adoption, market researchers will test AI’s ability to use synthetic responses of its own devising, effectively cutting human interviewees out of the equation. If their experiments prove successful, AI could provide near-instant low-cost ‘consumer’ insights, reducing the need to conduct costly surveys and, potentially, enabling brands to reach lucrative niche markets. 

To produce reliable responses, the technology must be able to understand the views of the target audience and provide results that match those elicited by traditional consumer research methods. The natural question at this stage of development is: can the synthetic data it will produce be trusted?

Can AI imitate real human responses?

Market researchers at Kantar have taken the first steps in answering this. They prepared a set of questions and compared real data drawn from human surveys with responses given by OpenAI’s GPT-4 large language model (LLM). The queries they used covered a wide range of matters, such as whether the price of luxury holidays is off-putting and whether a given piece of technology helps the owner to connect with people who share their interests.

When asked about more practical issues, GPT-4 gave similar answers to those provided by the human respondents. But the more nuanced questions, requiring greater emotional reflection, produced significant differences. 

Businesses will need to ensure they have a clear legal basis for uploading any personally identifiable data to AI tools

Such results were what you might intuitively expect, notes Jon Puleston, vice-president of innovation at Kantar’s profiles division. AI is good for some parts of market research, but it’s limited if asked to adopt the persona of diverse human audiences. 

“It’s clear that there are risks to relying solely on synthetic data if you’re making a business decision that’s worth billions,” he says. “Real human insights still form the heart of good market research. A more realistic use case for synthetic data is as a tool to complement, rather than replace, traditional research – for instance, by boosting sample sizes in surveys, particularly for niche audiences.”

The experiments’ results so far indicate that the LLM’s outputs are only as good as the human-profiling data fed into it, notes Marius Claudy, associate professor of marketing at University College Dublin, who has been researching the impact of training on AI outcomes. 

The problem of AI bias

While the technology can provide a good analysis of qualitative research, such as understanding what someone has said or written, it’s less effective at understanding the emotions that underpin people’s responses. This leaves the notion that AI could ever make traditional market research obsolete open to question. 

“The issue will always be how meaningful the results are, particularly when you’re asking about unknown propositions, such as a product that has yet to launch,” says Gary Topiol, managing director at market research firm QuestDIY. “Getting responses will be fairly easy but, as with all new methods, understanding when they can be trusted will take time.”

Real human insights still form the heart of good market research

Another big concern is the well-documented bias to which generative AI is susceptible – again, as a result of its training. For instance, researchers at Harvard have found that ChatGPT’s views and values are closely aligned with those of US citizens.

Claudy points out that “the more distant a country is from the US culturally, the lower the correspondence between the human responses there and ChatGPT’s. An LLM might be able to approximate the responses of the ‘average’ person on historical topics, but it might struggle to mimic the responses of certain subgroups or minorities accurately.”

The first concern about AI models is that they are programmed to pick up views from the internet. This can engender a Western, English-language bias and create an echo chamber too. While that’s a worry for Jeremie Brecheisen, managing partner of Gallup’s EMEA division, he thinks there’s an even bigger issue. 

As every market researcher knows, consumers don’t always make the logical choices a computer would expect of them. For instance, we often buy goods based on a whim, rather than a logical assessment of their attributes and overall value. This is why it is important to ask real people questions that cover a range of emotional responses. It’s the answers to these questions that AI will struggle to mimic for the foreseeable future, he says.

“Our brains and emotions are highly complex, so it will require a lot of experimentation to understand whether AI can get close to replicating the results of human surveys,” Brecheisen says. “There’s a lot of interest in using synthetic data to cut costs, but that’s not a great reason when you don’t know if you can trust those answers.”

Concerns are not limited to whether future models can replicate real human responses. There are also legal considerations, warns Ben Travers, a partner specialising in IT matters at law firm Knights. While he shares researchers’ worries that AI bias may lead to poor outcomes, he is also troubled by the use of personal data found on the internet to build profiles.

“Businesses will need to ensure they have a clear legal basis for uploading any personally identifiable data to AI tools,” he says. “And all AI users must be alert to copyright issues. These apply to both the content fed into an AI and the content it produces. Just because this material is easily accessible does not mean that it’s lawful to copy it. Such content is not ‘fair game’ – copyright will enable the rights owner to control how it is used and disseminated.”

The future of AI in market research and retail decision-making is unclear. While the technology is undoubtedly a boon to those compiling surveys and interpreting responses, it remains to be seen whether it can reliably answer questions itself. The ultimate prize of having a system that can accurately predict which car will sell best among millennials in Peru, say, or how much sugar to remove from a soda for the Hungarian market seems to be the stuff of science fiction – for now, at least.