Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 8h

Swivuriso: The South African Next Voices Multilingual Speech Dataset

Researchers have introduced Swivuriso, a 3000-hour multilingual speech dataset designed to advance automatic speech recognition (ASR) for seven South African languages. This dataset, developed under the African Next Voices project, covers critical domains like agriculture and healthcare, aiming to fill existing gaps in ASR resources. The paper details the dataset's creation, including ethical considerations and data collection methods, and presents initial ASR model training results. AI

IMPACT Enhances multilingual speech recognition capabilities for underrepresented languages, potentially enabling new AI applications in South Africa.

Vukosi Marivate
Swivuriso
African Next Voices