PulseAugur
EN
LIVE 16:30:37

ASR systems evaluated for low-resource African language text corpora

Researchers have evaluated the effectiveness of Automatic Speech Recognition (ASR) systems for creating text corpora for low-resource African languages, specifically Fongbe and Hausa. By fine-tuning the MMS-300M model on Fongbe data, they achieved a significant reduction in Word Error Rate (WER). For Hausa, an existing fine-tuned Whisper-Small model was utilized. While the ASR pipeline shows promise for Hausa, the quality of transcriptions for Fongbe indicates a need for improved models or post-processing. AI

IMPACT This research could accelerate the development of language models for underrepresented African languages by improving data acquisition methods.

RANK_REASON The item is an academic paper detailing research on ASR for low-resource languages. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

ASR systems evaluated for low-resource African language text corpora

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Prasenjit Mitra ·

    From Speech to Text Corpora: Evaluating ASR-Based Data Acquisition for Low-Resource Fongbe and Hausa

    Low-resource African languages lack text corpora needed for language model training. We investigate whether ASR pipelines can extend text resources for two typologically distinct West African languages: Fongbe (tonal, diacritic-rich) and Hausa (non-tonal). We fine-tune MMS-300M o…