English(EN) Swivuriso: The South African Next Voices Multilingual Speech Dataset

新的南非语音数据集推动多语言自动语音识别

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-10 04:00

研究人员推出了 Swivuriso，这是一个 3000 小时的多语言语音数据集，旨在推进七种南非语言的自动语音识别 (ASR)。该数据集是在非洲下一代语音项目下开发的，涵盖了农业和医疗保健等关键领域，旨在填补现有的 ASR 资源空白。该论文详细介绍了数据集的创建，包括伦理考量和数据收集方法，并展示了初步的 ASR 模型训练结果。 AI

影响增强了代表性不足语言的多语言语音识别能力，有可能在南非启用新的 AI 应用。

排序理由该集群包含一篇详细介绍用于语音识别研究的新数据集的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Vukosi Marivate, Kayode Olaleye, Sitwala Mundia, Andinda Bakainga, Unarine Netshifhefhe, Mahmooda Milanzie, Tsholofelo Hope Mogale, Thapelo Sindane, Zainab Abdulrasaq, Kesego Mokgosi, Chijioke Okorie, Nia Zion Van Wyk, Graham Morrissey, Dale Dunbar, Fran… · 2026-06-10 04:00

Swivuriso: The South African Next Voices Multilingual Speech Dataset

arXiv:2512.02201v3 Announce Type: replace Abstract: This paper introduces Swivuriso, a 3000-hour multilingual speech dataset developed as part of the African Next Voices project, to support the development and benchmarking of automatic speech recognition (ASR) technologies in sev…

报道来源 [1]

Swivuriso: The South African Next Voices Multilingual Speech Dataset

相关话题