PulseAugur
EN
LIVE 15:29:05

New Corpus Aims to Boost African Languages in Science

A new parallel corpus called AfriScience-MT has been developed to address the lack of scientific terminology in six African languages: Amharic, Hausa, Luganda, Northern Sotho, Yorùbá, and isiZulu. This corpus, created by professional translators and science communicators, spans 11 scientific domains and aims to decolonize scientific communication in Africa. Benchmarking of machine translation systems and large language models revealed that closed-source models like GPT-5.4 and Gemini-3.1-Flash-Lite outperformed open-source models, with NLLB-1.3B showing the best performance among open systems after fine-tuning. AI

IMPACT This corpus and its benchmarks could accelerate research into low-resource language translation and improve AI's accessibility in scientific domains across Africa.

RANK_REASON The cluster describes a new academic paper introducing a parallel corpus and benchmarking machine translation systems.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New Corpus Aims to Boost African Languages in Science

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Idris Abdulmumin, Tajuddeen Gwadabe, Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Nomonde Khalo, Ibrahim Said Ahmad, Abiodun Modupe, Anina Mumm, Sibusiso Biyela, Michelle Rabie, Johanna Havemann, Marek Rei, Jade Abbott, Vukosi Marivate ·

    AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation

    arXiv:2605.29741v1 Announce Type: new Abstract: The dominance of colonial languages in African education and scientific communication limits how hundreds of millions of speakers of African languages access and produce scientific knowledge. A core obstacle is the lack of establish…

  2. arXiv cs.CL TIER_1 English(EN) · Vukosi Marivate ·

    AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation

    The dominance of colonial languages in African education and scientific communication limits how hundreds of millions of speakers of African languages access and produce scientific knowledge. A core obstacle is the lack of established scientific terminology in these languages. We…