Using limited datasets to build facial recognition technologies, with images that don't represent society as a whole, has prompted an ethical debate about their evolution
With technology capable of matching billions of fingerprints a second, scanning retinas with infrared light to record the unique DNA pattern of blood vessels and live cross-checking faces with millions-strong databases, biometrics ethics has become an increasingly important area.
Governments, police forces and enterprises across the world have, for various reasons, come to adopt biometric technology to identify individuals based on biological and behavioural characteristics, particularly in the context of the COVID-19 epidemic, when in-person verification may not be possible.
Yet this pioneering field has been mired in concerns surrounding privacy, human rights and systematic prejudice, with some biometrics technologies, such as facial and voice recognition, shown to produce racial and ethical bias that could see innocent people jailed or refused essential welfare benefits.
Carly Kind, director of the Ada Lovelace Institute, an independent research body that monitors artificial intelligence (AI) and data ethics, says this is largely down to flawed or limited datasets used by companies.
Most training datasets only use photos of celebrities because they’re easier to find. But these aren’t representative of the world
“It comes down to bias in the data that informs the system,” says Kind. “This originates from unrepresentative datasets and this may be because the developer of the technology hasn’t ensured there is a proper representation of ethnicities, genders or social classes.”
The empirical evidence is stark. A groundbreaking study published in December by the US-based National Institute of Standards and Technology, which analysed 189 software algorithms from 99 developers – the majority of the industry – saw higher rates of inaccuracy for Asian and African-American faces relative to images of Caucasians, often by a factor of ten to one hundred times.
It followed research in 2018 by the MIT Media Lab, a research laboratory at the Massachusetts Institute of Technology, that found leading facial recognition systems by Microsoft, IBM and Megvii of China performed at a 0.8 per cent error rate when used on images of white men, but at a rate of 34.7 per cent when tested on images of dark-skinned women.
Why do ethnic minorities suffer bias?
MIT researchers pointed to the imagery datasets used to develop these facial recognition technologies, found to be 77 per cent male and 83 per cent white, as the reason behind the disparity in performance.
“It’s very difficult to reach a point where you have a completely objective dataset that is perfectly representative and unbiased,” says Kind. “But it can be dramatically reduced.”
However, Pawel Drozdowski, a researcher at Germany’s National Research Centre for Applied Cybersecurity, the largest of its kind in Europe, believes there are potentially more complex reasons behind biometrics ethics.
“There is a perception that with more training data we could eradicate bias and to some extent that’s true, but not completely,” he says. “Because isolating the actual source of bias is very challenging.”
According to Drozdowski, behavioural cues and variables such as lighting, distance from the facial recognition sensor and whether a person is wearing make-up can seriously impact the efficacy of biometrics.
The dearth of balanced data available for companies is also a stumbling block for improving biometrics ethics. Data protection laws, such as the European Union’s General Data Protection Regulation, supported by a wave of public opinion, have limited access to and handling of personal information. “We just don’t have enough data,” adds Drozdowski.
Preventing discrimination or systemic bias in biometrics
One approach to improving biometrics ethics by UK-based biometrics company Onfido has been to use only data provided by and with the consent of clients in creating its algorithms that protect against identity fraud.
“We don’t purchase any data, we don’t scrape the internet for any data, we don’t pay people to generate data,” says Susana Lopes, the company’s director of product. “Because we are tied to whatever our clients agree to share with us, this means our database is representative of our client base.”
Onfido’s concept is to use AI-based technology to assess whether a user’s official, government-issued ID is genuine or fraudulent and then compare it against facial biometrics of the user, in theory verifying their identity and physical presence.
Lopes says demand for Onfido’s services has surged in recent weeks with online healthcare work quadrupling. But even with such growth, she concedes: “It’s going to take longer to acquire datasets that are as balanced as they need to be.”
Prioritising biometrics ethics with diverse images
Other companies have taken more innovative and costly routes to reduce and prevent systemic bias in biometric systems.
Brent Boekestein, chief executive of Vintra, a California-based video analysis company, says its custom training database was created “from the ground up” in an effort to mitigate any potential bias.
“Most training datasets only use photos of celebrities because they’re easier to find,” says Boekestein, in reference to MS Celeb, a dataset of ten million face images harvested from the internet. “But these aren’t representative of the world and tend to be beautiful; they tend to have high cheekbones and they tend to be younger.”
Instead, Vintra’s dataset has been constructed with a diverse selection of public figures from around the world, such as African first ladies, thereby avoiding security or privacy concerns. It contains more than 20,000 identities taken from 76 countries, equally balanced across ethnic groups. “It took us a long time and cost us a lot of money, but we built a more holistic view of society,” says Boekestein.
Since 2018, the company has reduced the gap between the most (Caucasian) and least (African) accurate performance on ethnic groups from 11.9 per cent to 3.5 per cent, with an average accuracy now of 89.2 per cent, surpassing the leading commercially available competition by Microsoft and Amazon.
The future of biometric data
Despite these improvements, human rights campaigners oppose the technology due to the problematic state of biometrics ethics. “While inaccurate biometric surveillance presents clear dangers, a more accurate version of facial recognition also presents severe risks to our fundamental rights,” says Hannah Couchman, policy and campaigns officer at Liberty.
But it appears it will only be a matter of time before biometrics become an ever-greater part of our lives, from policing to electronic banking and citizen services. Almost all of India’s 1.25 billion population is already part of the national ID system, the largest biometrics system in the world.
Researcher Drozdowski believes the technology’s potential for good, such as finding missing children or identifying active criminals, must be balanced with safeguards. “Oversight is a big part of biometrics and any sort of automated decision-making,” he says.
Kind at the Ada Lovelace Institute agrees, suggesting the need for continuous risk assessment and awareness of these systems’ limits.
“Biometric technology can absolutely be used for positive ends,” she adds. “But it is going to create real societal and ethical questions, and you have to engage with workers and employers to understand what is being traded off and what is being gained.”