English(EN) Scaling Human and G2P Supervision for Robust Phonetic Transcription

G2P 监督为语音转录带来边际效益递减

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

一篇新的研究论文探讨了图素到音素 (G2P) 模型在规模化生成语音转录方面的有效性。研究发现，G2P 监督仅在人类标注少于 20-30 小时的情况下才有效，之后便不再提供显著改进，甚至可能降低跨方言的鲁棒性。研究表明，自动语音识别 (ASR) 预训练在提高语音转录准确性方面更有效，尤其对于非母语和非典型语音，与之前的系统相比，错误率降低了 2.3 倍。 AI

影响表明 ASR 预训练比 G2P 扩展在鲁棒语音转录方面更有效，影响语音技术发展。

排序理由该集群包含一篇发表在 arXiv 上的研究论文，详细介绍了语音转录方面的新发现。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Alexander Metzger, Aruna Srivastava, Ruslan Mukhamedvaleev · 2026-06-16 04:00

Scaling Human and G2P Supervision for Robust Phonetic Transcription

arXiv:2606.16019v1 Announce Type: new Abstract: Expert phonetic annotation is costly, especially for non-standard dialects and atypical speech. A common alternative is using Grapheme-to-Phoneme (G2P) models to auto-generate phonetic labels from text transcripts at scale. We study…

报道来源 [1]

Scaling Human and G2P Supervision for Robust Phonetic Transcription

相关实体

相关话题