How tech can bridge the global digital language divide

To avoid indigenous populations dropping out of the world’s conversation, steps are being taken to narrow the digital divide – using the tech we have and developing new methods

It’s easy to think that anyone with an internet connection can access the information, networking platforms and convenience-enhancing tech found online. Unfortunately, not. Of the world’s 7,151 languages, more than half don’t have any digital footprint. This phenomenon is described as the digital language divide and unless it is addressed, could accelerate the extinction of thousands of languages. 

World language resource Ethnologue reported that about 40% of languages are now endangered, often with fewer than 1,000 speakers remaining. Meanwhile, just 23 languages account for more than half the world’s population. If a language falls out of use, so does knowledge of the rich cultures and histories they describe. 

Tech startup Derivation aims to address the problem, using language insight solutions and pioneering AI. Stephen Jones, Derivation, acting CEO, explains: “Our solutions include measuring every living language to assess their progress towards increased and sustainable relevance in the digital age. Linguistics informs us that languages thrive on a ‘use it or lose it’ basis – if no one is writing, reading, signing or speaking a language, then over generations its usage declines and eventually it simply falls out of use. As the world relies more on technology, that use-or-lose equation is shifting into the digital arena.” 

A lack of digital infrastructure (such as keyboards, fonts, operating systems) supporting a language usually indicates real-world vulnerabilities are affecting a language and its native speakers. Among the most vulnerable both online and in the real world are non-written languages – those used in areas that have low literacy rates, insufficient internet or mobile phone coverage or areas with slow economic growth. Derivation research suggests that languages which are used in an official capacity (such as in government, legal, economic, health and education systems) are far more likely to be digitally supported than non-institutional languages. 

Jones notes: “The digital language divide is felt less in Europe, where there are fewer regional-specific languages, and seen more in Africa and Asia, where is generally a higher concentration of indigenous languages. 

“If you speak an ‘official language’, particularly one in Europe, you are much more likely to find digital options available in your mother tongue. If, though, your first language is an indigenous one, and particularly if it is one that doesn’t have official recognition in Asia and Africa, you are more likely to face digital language exclusion and forced to operate in a different language.”

The exclusion of these languages from the digital sphere is both a humanitarian and an economic issue, as native speakers of vulnerable languages have little or no access to educational materials and social interaction, while businesses lose out on marketing opportunities and valuable data from within these communities.

 If you speak an ‘official language’, particularly one within Europe, you are much more likely to find digital options available in your mother tongue

In his co-authored research paper, Sustaining Language Use, linguist Dr Gary Simons says the preservation of indigenous language is also a matter of public health and human rights: “Members of a community which is in the process of losing its language and culture experience significant amounts of disruption and stress in all areas of life. A child growing up in a community which is viewed with disdain develops a self-image that reflects that experience. 

“As social norms are abandoned, they frequently are not replaced all at once or with adequate equivalents. This can lead to social tensions, divisions in the community, disruptive and harmful patterns of behaviour, and even violence. There is evidence that communities experiencing such a transition may have elevated levels of alcoholism, drug addiction, HIV/AIDS, and suicide.” He goes on to cite the United Nations Declaration on the Rights of Indigenous Peoples (Article 13.1), which states: “Indigenous peoples have the right to revitalise, use, develop and transmit to future generations their histories, languages, oral traditions, philosophies, writing systems and literatures, and to designate and retain their own names for communities, places and persons.”

Aside from the human-centric arguments, Simons notes the importance of indigenous language preservation in ecological sustainability, in that “many minority language communities possess highly developed bodies of knowledge about their physical environment and have elaborated technologies to adapt and make use of their environment that are transmitted through equally highly developed linguistic forms in their languages. With the loss of those languages and cultures entire areas of human knowledge are also at risk and at the same time biological diversity is also threatened.”

Fortunately, the emerging field of language analytics – a combination of business intelligence, big data and linguistics – is used to inform developing AI and machine learning tools that can work to bridge the gap between established languages and those being left behind. 

SIL is a non-profit organisation involved in 1,600 active language projects in 98 countries. SIL data scientist Daniel Whitenack says that although the rapid integration of AI into existing digital systems has “the potential to further marginalise already-marginalised language communities, artificial intelligence creates amazing new possibilities for much of the world”. Specifically, the advancement of natural language processing (NLP), which “has enabled researchers and engineers to decipher long-lost languages, translate speech in one language to speech in another language directly without converting to text, generate long-form text that adapts to the style and content of human prompts, and translate between language pairs never seen explicitly by computer systems”.

Because they relied heavily on “manually crafted linguistic resources”, earlier NLP models largely ignored vulnerable indigenous languages but now, Whitenack says, “large tech companies, academic institutions, and grassroots organisations are beginning to realise that NLP methods could and should be extended to minority languages. For a technology like machine translation, this shift is evident from Facebook’s recent request for proposals on neural machine translation for low resource languages, Google’s focus on building translations systems for “the next thousand languages”, and the large, multi-institute Masakhane grassroots organisation’s research efforts that focus on machine translation for African languages.” 

The new attention placed on AI and NLP for indigenous languages will enable local community members, organisations, and non-profits to build momentum towards new, multilingual applications that give minority language speakers a chance to bring their voices into the global conversation and shifting economy. This will shift the attention, currently directed towards English and Mandarin, to a more democratised focus and participation across language communities.