New ASR methods tackle compute scaling and multilingual evaluation

By PulseAugur Editorial · [6 sources] · 2026-06-01 17:49

Researchers are developing new methods to improve automatic speech recognition (ASR) systems. One approach, LARM, uses a depth-conditioned looped Transformer to allow for adjustable test-time computation, achieving performance competitive with deeper models. Another system, Murmur, addresses long-form ASR by balancing chunk-based processing for low latency with long-context models for accuracy, utilizing attention sparsity. Additionally, a new metric called Script-Normalized WER (SN-WER) has been proposed to more accurately evaluate ASR performance in multilingual settings, particularly for Indic languages, by normalizing for script differences. AI

IMPACT Advances in ASR efficiency and evaluation metrics could improve the accuracy and usability of voice interfaces and transcription services.

RANK_REASON The cluster contains multiple academic papers detailing new research in automatic speech recognition (ASR) systems and evaluation metrics.

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 6 sources. How we write summaries →

New ASR methods tackle compute scaling and multilingual evaluation

COVERAGE [6]

arXiv cs.CL TIER_1 English(EN) · Sungmook Woo, Hyungu Kang, Chanwoo Kim · 2026-06-05 04:00

Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems

arXiv:2606.05179v1 Announce Type: new Abstract: Punctuation restoration improves ASR (Automatic Speech Recognition) readability. However streaming ASR requires online decisions with limited future context. In streaming ASR, the system predicts punctuation incrementally, which mak…
arXiv cs.LG TIER_1 English(EN) · Yacouba Kaloga, Shashi Kumar, Shakeel A. Sheikh, Driss Khalil, Petr Motlicek, Ina Kodrasi · 2026-06-04 04:00

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

arXiv:2606.04678v1 Announce Type: new Abstract: End-to-end ASR systems typically use fixed-depth acoustic encoders at inference, making it difficult to trade additional test-time computation for improved recognition without training a larger model. A natural approach is to reuse …
arXiv cs.LG TIER_1 English(EN) · Ina Kodrasi · 2026-06-03 10:01

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

End-to-end ASR systems typically use fixed-depth acoustic encoders at inference, making it difficult to trade additional test-time computation for improved recognition without training a larger model. A natural approach is to reuse a shared Transformer block recurrently, but we f…
arXiv cs.AI TIER_1 English(EN) · Wei-Tzu Lee, Keisuke Kamahori, Baris Kasikci · 2026-06-02 04:00

MURMUR: An Efficient Inference System for Long-Form ASR

arXiv:2606.01483v1 Announce Type: cross Abstract: Long-form automatic speech recognition (ASR) requires both high accuracy and low latency, but existing systems force a trade-off between the two. Chunk-based pipelines process audio in parallel windows for low latency, but lose cr…
arXiv cs.CL TIER_1 English(EN) · Priyaranjan Pattnayak · 2026-06-02 04:00

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

arXiv:2606.02548v1 Announce Type: new Abstract: Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual sett…
arXiv cs.CL TIER_1 English(EN) · Priyaranjan Pattnayak · 2026-06-01 17:49

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual settings where ASR models may emit romanized text. W…

COVERAGE [6]

Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

MURMUR: An Efficient Inference System for Long-Form ASR

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

RELATED ENTITIES

RELATED TOPICS