Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems
Researchers are developing new methods to improve automatic speech recognition (ASR) systems. One approach, LARM, uses a depth-conditioned looped Transformer to allow for adjustable test-time computation, achieving performance competitive with deeper models. Another system, Murmur, addresses long-form ASR by balancing chunk-based processing for low latency with long-context models for accuracy, utilizing attention sparsity. Additionally, a new metric called Script-Normalized WER (SN-WER) has been proposed to more accurately evaluate ASR performance in multilingual settings, particularly for Indic languages, by normalizing for script differences. AI
IMPACT Advances in ASR efficiency and evaluation metrics could improve the accuracy and usability of voice interfaces and transcription services.