English(EN) Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus

新的匈牙利语 ASR 语料库使训练数据翻倍，提高准确性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-29 16:01

研究人员推出了 BEA-Dialogue+，一个用于匈牙利语对话式自动语音识别 (ASR) 的扩展语料库。这个新数据集将可用训练数据增加到 200 小时，放宽了分割标准以允许更多材料，同时保持说话人分离。使用 Whisper 和 FastConformer 模型进行的评估表明，更大的数据集，特别是与串行输出训练 (SOT) 微调结合使用时，可以显著提高转录准确性指标。 AI

影响为匈牙利语对话 ASR 提供了一个更大、更具挑战性的基准，能够更好地训练和评估转录系统。

排序理由该集群包含一篇详细介绍新数据集和 ASR 模型评估的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · M\'at\'e Gedeon, Piroska Zs\'ofia Barta, P\'eter Mihajlik, Katalin M\'ady · 2026-06-01 04:00

扩展对话式匈牙利语语音识别：BEA-Dialogue+ 语料库

arXiv:2605.31469v1 Announce Type: cross Abstract: Conversational automatic speech recognition in Hungarian is constrained by the limited amount of publicly available dialogue-style training data. The BEA-Dialogue corpus addresses this need, but its strictly speaker-disjoint train…
arXiv cs.AI TIER_1 English(EN) · Katalin Mády · 2026-05-29 16:01

扩展对话式匈牙利语语音识别：BEA-Dialogue+ 语料库

Conversational automatic speech recognition in Hungarian is constrained by the limited amount of publicly available dialogue-style training data. The BEA-Dialogue corpus addresses this need, but its strictly speaker-disjoint train/dev/eval split reduces the usable material to onl…

报道来源 [2]

扩展对话式匈牙利语语音识别：BEA-Dialogue+ 语料库

扩展对话式匈牙利语语音识别：BEA-Dialogue+ 语料库

相关实体

相关话题