English(EN) Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems

新的ASR方法应对计算扩展和多语言评估

作者 PulseAugur 编辑部 · [6 个来源] · 2026-06-01 17:49

研究人员正在开发新的方法来改进自动语音识别（ASR）系统。一种名为LARM的方法使用深度条件循环Transformer，允许可调的测试时间计算，实现了与更深层模型相媲美的性能。另一个系统Murmur通过平衡低延迟的基于块的处理和准确性的长上下文模型，利用注意力稀疏性来处理长篇ASR。此外，还提出了一种名为脚本归一化WER（SN-WER）的新指标，通过对脚本差异进行归一化，以更准确地评估多语言环境下的ASR性能，特别是对于印度语言。 AI

影响 ASR效率和评估指标的进步可以提高语音接口和转录服务的准确性和可用性。

排序理由该集群包含多篇学术论文，详细介绍了自动语音识别（ASR）系统和评估指标的新研究。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。我们如何撰写摘要 →

报道来源 [6]

arXiv cs.CL TIER_1 English(EN) · Sungmook Woo, Hyungu Kang, Chanwoo Kim · 2026-06-05 04:00

面向流式语音识别系统的加权前瞻评分方法的高效标点恢复

arXiv:2606.05179v1 Announce Type: new Abstract: Punctuation restoration improves ASR (Automatic Speech Recognition) readability. However streaming ASR requires online decisions with limited future context. In streaming ASR, the system predicts punctuation incrementally, which mak…
arXiv cs.LG TIER_1 English(EN) · Yacouba Kaloga, Shashi Kumar, Shakeel A. Sheikh, Driss Khalil, Petr Motlicek, Ina Kodrasi · 2026-06-04 04:00

用于 ASR 的测试时计算缩放与深度条件循环 Transformer

arXiv:2606.04678v1 Announce Type: new Abstract: End-to-end ASR systems typically use fixed-depth acoustic encoders at inference, making it difficult to trade additional test-time computation for improved recognition without training a larger model. A natural approach is to reuse …
arXiv cs.LG TIER_1 English(EN) · Ina Kodrasi · 2026-06-03 10:01

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

End-to-end ASR systems typically use fixed-depth acoustic encoders at inference, making it difficult to trade additional test-time computation for improved recognition without training a larger model. A natural approach is to reuse a shared Transformer block recurrently, but we f…
arXiv cs.AI TIER_1 English(EN) · Wei-Tzu Lee, Keisuke Kamahori, Baris Kasikci · 2026-06-02 04:00

MURMUR：一种高效的长文本语音识别推理系统

arXiv:2606.01483v1 Announce Type: cross Abstract: Long-form automatic speech recognition (ASR) requires both high accuracy and low latency, but existing systems force a trade-off between the two. Chunk-based pipelines process audio in parallel windows for low latency, but lose cr…
arXiv cs.CL TIER_1 English(EN) · Priyaranjan Pattnayak · 2026-06-02 04:00

SN-WER：脚本归一化词错误率用于多脚本印度语言自动语音识别评估

arXiv:2606.02548v1 Announce Type: new Abstract: Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual sett…
arXiv cs.CL TIER_1 English(EN) · Priyaranjan Pattnayak · 2026-06-01 17:49

SN-WER：多脚本印度语言自动语音识别评估的脚本归一化词错误率

Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual settings where ASR models may emit romanized text. W…

报道来源 [6]

面向流式语音识别系统的加权前瞻评分方法的高效标点恢复

用于 ASR 的测试时计算缩放与深度条件循环 Transformer

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

MURMUR：一种高效的长文本语音识别推理系统

SN-WER：脚本归一化词错误率用于多脚本印度语言自动语音识别评估

SN-WER：多脚本印度语言自动语音识别评估的脚本归一化词错误率

相关实体

相关话题