LLMs struggle with Hausa and Fongbe translation, metrics unreliable

By PulseAugur Editorial · [1 sources] · 2026-06-20 23:23

A new study evaluated the machine translation capabilities of four large language models (LLMs) for Hausa and Fongbe, two West African languages. The research found that while Hausa achieved acceptable translation quality with models like GPT-4o mini, Fongbe translations were poor across all evaluated systems. Model performance varied significantly between the two languages, with Gemini 2.5 Flash leading for Fongbe and GPT-4o mini for Hausa, indicating that performance on one low-resource language does not predict performance on another. The study also highlighted issues with standard automatic metrics, which showed weak correlation with human judgment for Hausa and limitations due to embedding collapse in neural metrics for both languages. AI

IMPACT Highlights limitations of current LLMs for low-resource languages and the unreliability of standard translation metrics, necessitating careful evaluation.

RANK_REASON Academic paper evaluating LLM performance on specific languages and metrics. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLMs struggle with Hausa and Fongbe translation, metrics unreliable

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Prasenjit Mitra · 2026-06-20 23:23

Evaluating Large Language Models for Hausa and Fongbe Machine Translation: Benchmarks, Failures, and Metric Reliability

We investigate the translation quality of current large language models (LLMs) for English-to-Hausa and English-to-Fongbe - two typologically distinct West African languages from the Afroasiatic and Niger-Congo families respectively - and evaluate whether standard automatic metri…

COVERAGE [1]

Evaluating Large Language Models for Hausa and Fongbe Machine Translation: Benchmarks, Failures, and Metric Reliability

RELATED ENTITIES

RELATED TOPICS