Are AI systems credible coders?

During a quarterly earnings call last year, Alphabet’s chief executive, Sundar Pichai, revealed that AI writes a quarter of the company’s code. “This helps our engineers do more and move faster,” he claimed.

In March, Anthropic’s CEO, Dario Amodei, said as much as 90% of code would be generated by AI within months. Although his prediction has not come true, it’s clear that executives’ ambition to use AI systems for coding is growing. Data from Stack Overflow shows that 84% of developers use or are planning to use AI tools in their development process, up from 76% in 2024.

The advantages of doing so are obvious: shorter development cycles, higher productivity and better resource allocation; it’s often said that AI integration will allow humans to focus on the more challenging or creative aspects of their jobs. Recent trials across several UK government departments found that coders and tech engineers saved almost an hour a day by using AI assistants to help write code.

Most tech professionals trust AI systems

Such achievements have produced confidence in AI among tech professionals. A survey by Clutch revealed that more than half of senior software developers believe large language models (LLMs) can already code better than most humans.

But good code isn’t about quality alone – it’s about security too. Late last year, nearly 80% of software developers told security firm Snyk that they believe AI-based coding tools generate more secure code, and that they trust those AI systems just as much as their own colleagues. Fewer than six in 10 were concerned about vulnerabilities introduced by Al.

“Why is this? In short, it’s because engineers frequently think of security as a ‘bug’, and one thing generative AI is really good at is identifying and fixing bugs,” says Randall Degges, Snyk’s head of developer relations and community.

But analysis suggests that this confidence is misplaced.

Errors abound in AI-generated code

Jessica Ji is a senior research analyst at Georgetown University’s Center for Security and Emerging Technology (CSET). She says: “There’s an underlying belief that you can replace a worker perfectly – that an AI can do the exact job that a person does – but that’s generally not true, especially in software.”

What’s more, research by CSET shows that models may generate insecure code and create damaging feedback loops in training future AI systems. Moreover, the models may be inherently vulnerable to attack and manipulation. Almost half of the code snippets produced by five different models tested by CSET contained bugs rated as ‘impactful’ and potentially leading to malicious exploitation.

Veracode, a software firm, has produced similar findings. Its sample analysis shows that just 55% of AI-generated code is free from known cybersecurity vulnerabilities – and large AI models do not significantly outperform small models.

Java is the riskiest language for AI code generation, according to Veracode, with a security failure rate of more than 70%. Other common languages, such as Python, C# and JavaScript, carried failure rates between 38% and 45%. The research also found that LLMs failed to secure code against cross-site scripting and log injection in nearly nine in 10 cases.

Teams already have the tools to mitigate risks

According to research by Aikido Security, such security failures produce real-world impacts. A report by the security company revealed that 69% of US and European organisations had uncovered vulnerabilities in AI-written code, and one in five CISOs admitted their firm had suffered a major security incident caused by AI-generated code.

There’s an underlying belief that an AI can do the exact job a person does. That’s generally not true

Mike Wilkes, the firm’s information and security chief, adds: “No one knows who’s accountable when AI-generated code causes a breach. Developers didn’t write the code, infosec didn’t get to review it and legal is unable to determine liability should something go wrong. It’s a real nightmare of risk.”

Ji says traditional testing tools and rigorous processes can be used to mitigate such risks.

“The way we check for security doesn’t have to be new – we don’t have to reinvent the wheel for AI-generated code, we can just use our current process to check,” she says. “Many of our defences already exist.”

Alex Cowperthwaite, the technical director of research and development at the cyber risk practice at Kroll, an advisory firm, recommends using project-wide prompts to ensure that all AI-powered coding tools inherit secure coding practices, such as reference architecture, design patterns and coding standards.

Secondly, he says, beware of the ‘trust bias’. “As humans, if we repeatedly see good results we learn to trust the process. This trust bias can be dangerous in the context of AI-generated code. Such code must maintain a risk-centric level of verification, including automated code scanning, adversarial review and red-team validation.”

‘A healthy dose of pragmatic scepticism’

NetMind.AI, an artificial general intelligence research lab and infrastructure start-up uses AI-generated code widely, but directs senior engineers to design and maintain the critical foundation, covering identity, permissions, secret management and API contracts. The company then uses these as paved-path libraries and templates, and its AI systems generate demo and application logic on top of this framework, rather than from scratch.

Xiangpeng Wan, the company’s product lead and strategic research lead, says he uses AI-generated code in all his projects now, but notes that humans must remain in the loop to review output and developers should maintain a “healthy dose of pragmatic scepticism.”

Wan explains: “Before any code is committed, I routinely ask what edge cases may have been missed, whether the logic makes business sense and, most importantly, whether I fully understand what the code is doing.”

Still, for many, maintaining that healthy scepticism will require a shift in mindset. Given the enthusiasm for AI-generated code among C-suite leaders, developers would do well to make this shift quickly.

TechnologyArtificial Intelligence

Are AI systems credible coders?

Most tech professionals trust AI systems

Errors abound in AI-generated code

Teams already have the tools to mitigate risks

‘A healthy dose of pragmatic scepticism’

Read this next

Check out top-rated tools tailored for teams like yours

Want to read on?

Subscribe to our Daily Newsletter