PulseAugur
EN
LIVE 12:00:51

New ASR systems and metrics tackle long-form and multilingual challenges

Two new research papers introduce methods to improve automatic speech recognition (ASR) systems. The first paper, "MURMUR," presents an efficient inference system designed for long-form ASR that balances accuracy and low latency by processing audio in intermediate-sized chunks and optimizing attention sparsity. The second paper, "SN-WER," proposes a new evaluation metric called Script-Normalized WER (SN-WER) to address inaccuracies in standard Word Error Rate calculations when dealing with multilingual ASR, particularly for Indic languages, by normalizing scripts before comparison. AI

IMPACT These papers introduce novel techniques for improving ASR accuracy and evaluation, potentially leading to more robust speech-to-text systems for diverse languages and long-form content.

RANK_REASON Two academic papers published on arXiv introducing new methods for ASR.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Wei-Tzu Lee, Keisuke Kamahori, Baris Kasikci ·

    MURMUR: An Efficient Inference System for Long-Form ASR

    arXiv:2606.01483v1 Announce Type: cross Abstract: Long-form automatic speech recognition (ASR) requires both high accuracy and low latency, but existing systems force a trade-off between the two. Chunk-based pipelines process audio in parallel windows for low latency, but lose cr…

  2. arXiv cs.CL TIER_1 English(EN) · Priyaranjan Pattnayak ·

    SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

    arXiv:2606.02548v1 Announce Type: new Abstract: Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual sett…