Questioning the role of AI in exam marking

The application of machine learning to exam marking might save time and money, but some educational scientists think it could also change the nature of assessment itself


In January 2020, Ofqual invited schools to submit student essays for a research project to explore the potential of artificial intelligence (AI) in exam marking. In the accompanying blog, the exam regulator reassured teachers and pupils this was just a preliminary test and: “We wouldn’t suddenly see AI being used at scale in marking high-profile qualifications overnight.”

Just seven months later, prime minister Boris Johnson was blaming a “mutant algorithm” for an exam fiasco that saw more than 40 per cent of A-level students in England get downgraded, including many high achievers from disadvantaged backgrounds. That led to the AI marking study being put on hold. 

This despite the A-level algorithm being based on statistical methods rather than AI, not to mention it was trying to achieve the impossible by generating exam results without there being any exams.

Still, in the public’s mind, it was all part of the same problem. A sudden and unsettling ceding of power to opaque machine-led operating systems with real-world implications for young people’s futures. As Robert Halfon, Conservative MP and chairman of the education select committee, says: “What Ofqual needs now is a period of long reflection and internal examination rather than an AI revolution.” 

Use of algorithms and AI

Algorithms, statistics, data science and AI are already widely used in education. Ofqual themselves have been using algorithms for years to offset grade inflation and smooth out regional discrepancies without any public fuss or worry.

AI is used in plagiarism detection, exam marking and tutoring apps with real-time feedback, such as On-Task and Santa for TOEIC (Test of English for International Communication) in South Korea, which has more than one million subscribers and appears to rapidly improve student test scores in just 24 hours using an intelligent machine learning-based algorithm.

In America, AI-driven platform Bakpax that auto-grades students’ work, and is free and compatible with Google Classroom, has been proving popular with teachers during the pandemic. Its marketers promise teachers “more time for your students or yourself” and to “provide students with instant feedback when they’re still most engaged”, along with performance insights on which topics are easier or more challenging for students.

Dee Kanejiya, founder and chief executive of Cognii, an AI-based platform that uses natural language conversion to assess passages of longer text that have traditionally been harder for AI to grade accurately, wants to help correct what he sees as an over-reliance on multiple choice questions in US assessments. 

He believes these do not help students in the real world and is a format that favours boys over girls. But marking longer answers is time consuming for teachers and therefore expensive, which is where he hopes Cognii can help.

Kanejiya is excited about the potential of AI to free up teachers from repetitive tasks such as marking, though he insists it isn’t about replacing them. “You get more time for that intimate relationship between faculty and students if teachers are not grading,” he says. “They can spend more quality time with the students, time for the emotional side of things, which they’re good at.” 

He also thinks cloud-based AI systems such as Cognii could play a crucial role in improving the access and affordability of education globally, especially in countries which suffer teacher shortages.

Being aware of data bias

But the potential labour and cost benefits to using AI in education inevitably come with some downsides. Last year there was a story about students gaming an AI marking system by typing in lots of keywords in otherwise incoherent sentences and scoring full marks. Kanejiya says they have checks in place to prevent that type of abuse with Cognii. “We have factored in syntax and semantics to the system so that couldn’t happen,” he says.

Algorithmic and AI bias is a real concern as well. We expect these models to be neutral and impartial, but the data we feed them means they are often subject to many of society’s existing biases and can discriminate against certain user groups. 

Hansol Lee and Renzhe Yu are postgraduate students at the Cornell Future of Learning Lab and experts in algorithmic fairness. “Machines learn historical principles and rules, and therefore learn what to apply to the future,” says Yu. “But that historical data will contain inequalities, such as students of colour have had lower achievement in the past or black students don’t learn maths. That simple rule could make the system recommend those students don’t learn maths.”

Bias can also occur if an AI system is trained on a dataset that has less data for a certain student group. It’s a data representation problem that isn’t deliberate, but nonetheless exists. “There might not be a quick-and-easy fix,” says Lee. “But it’s important to be aware of the problem so you can find other ways to make the system less biased.”

In adaptive learning tests, which are often used in private school entrance examinations in the UK, students are exposed to a different question path according to each answer they give, which presents its own concerns. 

“One study found the algorithm can make a more accurate diagnosis of the student’s performance if they’re a quick learner on the more advanced path,” says Yu. “So, anything it recommends to the quick learner would be more appropriate, but using the same AI, the slow learner will start to suffer.”

Learning about learning

Last summer gave the British public an uncomfortable insight into the dangers of data-science modelling in exam marking. Dr Rose Luckin is professor of learner-centred design at University College London and director of EDUCATE, a hub for educational technology startups. Is she worried the A-level debacle will derail the use of AI in UK education? “It has set the cause back,” she says, but cautions against rejecting its use entirely because of concerns regarding algorithmic fairness.

Luckin adds: “To avoid AI because it’s too risky would be a huge shame, as there is lots of potential for schools and especially for disadvantaged learners.”

These benefits include a more tailored and adaptive assessment system centred on the individual learner, rather than the current one-size-fits-all model that favours a certain type of student who is good at exams. 

“At the moment, we assess what’s quite easy to assess,” says Luckin. “But AI lets us assess a number of things we can’t assess that are things society needs for the fourth industrial revolution, such as collaborative problem solving, which PISA [Programme for International Student Assessment] introduced a couple of years ago, metacognitive awareness, self-regulation; incredibly important things that boost learning.”

She says we could use AI to do continuous formative assessment, rather than one-off exams. “That could help us really understand the learning process, as well as the learning product, so it becomes a learning activity not just an assessment activity. You can learn about yourself as a learner, what your strengths and weaknesses are, where you need to focus more attention and what coping strategies work for you,” says Luckin.

One thing she doesn’t want is for AI in education to be solely reduced to the role of auto-marking exams, as she thinks this would be a missed opportunity. “Assessment will always be the tail that wags the dog in education,” she says. “It’s so important to the system, to the government, but also to parents, so I think there will be a strong focus on using AI in assessment. 

“But what I fear is that we’ll invest money and skill and expertise in automating something that perhaps itself is not the right thing, rather than looking at how we could do this differently.”

Instead Luckin would love AI to usher in a future where the learner themselves demonstrates what they’ve learnt. “My real dream is where the learners themselves say, ‘I think I should have this grade’ and bring out all the evidence built up over years to demonstrate why, showing they have understood themselves well enough to pull that together, which would tell you so much about that individual,” she says.

How could this be scaled for something as big, complex and life dictating as university admissions? “There will be ways of digitalising that,” says Luckin. “I imagine there would be some sort of digital gate or point through which a student passes and they demonstrate their credentials, and over time you would be able to automate that.

“We’re not miles away from this technically; you’d need broadband connectivity everywhere, but what is much harder is the human acceptance of it. At what point do you feel you can say to parents, ‘OK, we’re phasing out the exams now’? They’d say, ‘Well how does my child get to the next stage?’ And I completely understand that.”

The best approach would probably involve a hybrid transition period until you reached a point where people felt confident in the replacement as a “truer assessment” of an individual’s learning and strengths, “celebrating human intelligence and the non-cognitive skills which differentiate us from machines”, Luckin concludes.