The world has been swept up in a whirlwind of AI mania, with tools such as OpenAI’s ChatGPT, Meta’s Llama 2, and Mistral AI captivating millions globally with their ability to generate human-like text.
But for many tech-savvy Africans, the excitement has been tempered by a frustrating reality: when languages like Hausa, Amharic, or Kinyarwanda are entered into the chat, many of these advanced systems falter, often producing nonsensical responses.
Technology experts warn the lack of LLMs in African languages will lead to the exclusion of millions of people on the continent, increasing both the digital and economic divide.
Closing the AI language gap
Africa is home to more than 2,000 languages spoken across 54 countries, according to the United Nations Educational, Scientific and Cultural Organization (UNESCO).
However, the majority of African languages remain underrepresented on the internet. English dominates the digital space, accounting for around 50% of all websites, followed by Spanish, German, Japanese, and French.
Along with the Nigerian government initiative, there are also a small but growing number of African startups rising to the challenge of developing AI tools in languages like Swahili, Amharic, Zulu and Sesotho.
In Kenya, for instance, health tech firm Jacaranda Health has pioneered the first LLM operating in Swahili to improve maternal healthcare in East Africa.
Many African languages are low-resource languages, meaning there is a scarcity of data to train these models effectively – unlike high-resource languages such as English or French.
Michael Michie, co-founder of Everse Technology Africa, an AI startup building intelligence into data protection and privacy, said collecting the data needed to train LLMs also raised ethical questions.
In many African communities, oral tradition predominates, and certain communities may not be interested in sharing their language to train LLMs and this should be respected.