Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 7h

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

Researchers have developed UMA-Split, a novel non-autoregressive model designed for speech recognition in both English and Mandarin. This model addresses limitations of the original unimodal aggregation (UMA) approach, which struggled with languages like English where tokens may not align well with acoustic frames. UMA-Split introduces a split module that allows each aggregated frame to map to multiple tokens, improving representation learning and performance across different languages. AI

IMPACT Introduces a new method for improving cross-lingual speech recognition accuracy.

English
Mandarin
UMA-Split
Ying Fang