English(EN) Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

ASR模型在荷兰儿童语音上的评估，Whisper-medium表现最佳

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-29 04:00

一项新近发表在arXiv上的研究评估了包括Whisper、Parakeet和Wav2Vec2在内的九个最先进的自动语音识别（ASR）模型在荷兰儿童语音数据集上的表现。微调后的Whisper-medium模型展现了最佳的整体性能，在JASMIN数据集上的词错误率（WER）为5.54%，在更具挑战性的DART数据集上为70.37%。研究人员还开发了一种自动识别发音准确且置信度高的语句的方法，减少了手动验证的需求，并实现了数据中很大一部分的自动转录。 AI

影响这项研究有望提高儿童语音转录在语言学研究中的准确性和效率。

排序理由这是一篇研究论文，详细介绍了ASR模型在特定类型语音数据上的表现。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Gus Lathouwers, Lingyun Gao, Catia Cucchiarini, Helmer Strik · 2026-05-29 04:00

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

arXiv:2605.28833v1 Announce Type: cross Abstract: Automatic speech recognition (ASR) has the potential to substantially reduce manual annotation effort in child speech research by generating automatic transcriptions. However, obtaining reliably high-quality ASR transcriptions for…

报道来源 [1]

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

相关实体

相关话题