Inrupt CEO on how to stop AI from stealing all our data

In 2016, Tim Berners-Lee, the creator of the world wide web, released the solid protocol – a web decentralisaiton project, which is vital for realising his future vision of the internet, where people retain full control of the data they share online. But to achieve this vision he needed scale. The following year, Berners-Lee joined forces with startup founder John Bruce to form Inrupt, a company that aims to create a dependable, scalable and enterprise-grade version of the protocol, with a view to bringing large organisations and governments on board.

Inrupt enjoyed some early success with its creation of digital data ‘wallets’ and e-government trials in Flanders, Belgium, where citizens now have full visibility over the data they own and how the government interacts with it.

However, generative AI, which is trained on large quantities of online information, now threatens to impede Inrupt’s plans. “We’ve got to keep data with the user, where the individual decides who can see their data and for what purpose and allow them to do it in an easy way,” says Bruce. “Otherwise we end up in a dystopian place where the large language models own us – and that’s frightening.”

Data and GenAI: untrackable, untraceable

The amount of data any one individual generates is gargantuan. Not only is our every click recorded, so too is our metadata – the data about our data. This includes information about when we are online, the location we’re in and who we are in contact with. This information was already being used to construct alarmingly detailed digital profiles about individuals.

But now, with the data-crawling bots that power generative AI, deepfakes and other simulacra, even more of our data is up for grabs. Data brokers increasingly treasure uniquely individual characteristics such as what we look like and how we sound. We might not even be aware of where this data is being used or for what purpose.

This technology inverts the problem, makes it trackable and makes for a very much more sane world

In server rooms and data centres, vast amounts of data train the large language models that power generative AI. But given that all this data is vast and entangled, providers would be hard pressed to tell you exactly which data was used to inform their AI’s outputs.

According to Bruce, Inrupt’s data wallets could help solve this challenge by turning it on its head. “This technology absolutely inverts the problem, makes it trackable and makes for a very much more sane world,” he says.

What are Inrupt’s data wallets?

Each Inrupt wallet can contain transactional data, such as purchases made, private data stored locally and public data, which is available online.

“This technology allows you to give your customer or citizen a digital wallet and then they get to control who or what can have access to the data in that wallet,” says Bruce.

The premise flips the problem of mass data collection by forcing organisations to gain consent from the user for their data before they can use it. By collating this information into a single data store, the user can then sign off on how their data is used at a granular level, even down to which LLMs can be trained on their data.

These wallets could also simplify compliance for businesses, allowing for much easier data audits.

Data provenance in AI

This access to our data can play out in strange ways, such as when voiceover artist Gayanne Potter caught her own voice being used on Scotrail’s trains. Potter claims the rail operator has used her voice to create its new AI-powered train announcer, called Iona, without her consent. According to Potter, she had completed voiceover work for a Swedish technology company in 2021 – for what she believed was accessibility software – and was later surprised to find her likeness announcing passenger information on the Scottish rail network.

“After the years that I’ve gone through to try to have my data removed, it’s still being used,” Potter told the BBC. “I also have to look on social media and see people mocking it, berating it. They don’t realise it’s actually a real person who’s been put through a dreadful voice app.”

Potter has since asked her lawyers to to request Scotrail removes what she believes is an AI copy of her voice from its announcements.

The case demonstrates how the onus is on the individual to resolve issues around data consent. This trend is eroding trust in digital platforms. Potter’s dilemma brings to light the importance of data provenance in relation to AI.

The enterprise AI web crawler problem

Consumers frequently use AI to hunt for bargains. But the web crawlers deployed by those AI models are often banned by online shops, because they appear similar to malicious bots. This has led to a situation where AI providers are trying to strike agreements with individual merchants, a complicated and time-consuming process for all involved. Global payments firm, Visa, is now trying to solve this problem – and Inrupt’s technology plays a key role.

“The LLMs found that, when they went to merchant websites, they were considered a security attack,” says Bruce. “Merchants have been blocking them. But wherever they resolved that, another problem kicked in.”

When these LLMs tried to compare prices between online stores, they also found that each merchant displays their goods in different ways. So the AI had to figure out how to navigate different stores and present those price comparisons back to the consumer.

Visa is now trying to create a single standard, which AI companies access via a set of APIs.

Inrupt gives consumers the option to decide which AIs they allow to see the query, as well as where and how the data for that query is used. This means consumers can enjoy the ease of service offered by AI providers, without giving away all of their data.

“It creates a trusted environment,” says Bruce. “You can run a query so that the LLM doesn’t get to take the data away – it just operates in your wallet, under your remit and then none of that data moves out of there.”

Meanwhile, an accompanying AI feature from Inrupt called Charlie serves as a kind of AI assistant that works for the user’s interests over those of the big-tech firms. “Charlie is an AI that lives on your wallet and it can intermediate with the LLMs out in the big, wide world,” says Bruce. “It basically acts as a gatekeeper, sitting astride your data and deciding with you what to allow the LLMs to see, to give you the kind of value you want.”

Data control and the UK’s AI copyright chaos

A greater level of ownership over their data could also help creators wrest control back from controversial plans, such as the upcoming AI opt-out model in the UK. With these plans, creators would be forced to opt out of having their content scraped by large language models rather than opting in to having their work used by AI platforms. It’s a contentious idea that tasks creators with protecting their work and has subsequently drawn ire from artists including Paul McCartney, Kazuo Ishiguro, Shirley Bassey and Elton John, who branded the government “absolute losers”.

“We’ve talked to a couple of very well-known artists who are really concerned about this,” says Bruce, who adds that such a data-protection model for creatives would be feasible with Inrupt and its wallets. While Inrupt hopes to protect individuals’ data, its business model is reliant on securing contracts with enterprises and governments.

So, in an ideal world, says Bruce, major labels could approach Inrupt and place the data from their entire roster of artists into a wallet. This would then prevent AI companies from accessing any of their data without their explicit consent.

However, he adds, while protecting creatives and stopping GenAI platforms from pinching content is all “conceptually” doable, the company is “modest in size”. So ultimately, that vision of putting data control back in the hands of individuals will depend on enterprises seeing the value in doing so. But given all the noise made in protest to the AI opt-out, perhaps that day won’t be too far away.

TechnologyArtificial IntelligenceCybersecurityDigitalInterview

Inrupt CEO on how to stop AI from stealing all our data

Data and GenAI: untrackable, untraceable

What are Inrupt’s data wallets?

Data provenance in AI

The enterprise AI web crawler problem

Data control and the UK’s AI copyright chaos

Read this next

Want to read on?

Subscribe to our Daily Newsletter