A new parallel corpus called AfriScience-MT has been developed to address the lack of scientific terminology in six African languages: Amharic, Hausa, Luganda, Northern Sotho, Yorùbá, and isiZulu. This corpus, created by professional translators and science communicators, spans 11 scientific domains and aims to decolonize scientific communication in Africa. Benchmarking of machine translation systems and large language models revealed that closed-source models like GPT-5.4 and Gemini-3.1-Flash-Lite outperformed open-source models, with NLLB-1.3B showing the best performance among open systems after fine-tuning. AI
IMPACT This corpus and its benchmarks could accelerate research into low-resource language translation and improve AI's accessibility in scientific domains across Africa.
RANK_REASON The cluster describes a new academic paper introducing a parallel corpus and benchmarking machine translation systems.
- AfriScience-MT
- Amharic
- Gemini-3.1-Flash-Lite
- GPT-5.4
- Hausa
- Luganda
- NLLB-1.3B
- Northern Sotho
- TranslateGemma-12B
- Yorùbá
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →