UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
Researchers have developed UMA-Split, a novel non-autoregressive model designed for speech recognition in both English and Mandarin. This model addresses limitations of the original unimodal aggregation (UMA) approach, which struggled with languages like English where tokens may not align well with acoustic frames. UMA-Split introduces a split module that allows each aggregated frame to map to multiple tokens, improving representation learning and performance across different languages. AI
IMPACT Introduces a new method for improving cross-lingual speech recognition accuracy.