New UMA-Split model enhances English and Mandarin speech recognition

By PulseAugur Editorial · [1 sources] · 2026-06-18 04:00

Researchers have developed UMA-Split, a novel non-autoregressive model designed for speech recognition in both English and Mandarin. This model addresses limitations of the original unimodal aggregation (UMA) approach, which struggled with languages like English where tokens may not align well with acoustic frames. UMA-Split introduces a split module that allows each aggregated frame to map to multiple tokens, improving representation learning and performance across different languages. AI

IMPACT Introduces a new method for improving cross-lingual speech recognition accuracy.

RANK_REASON The cluster describes a new research paper proposing a novel model for speech recognition. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Ying Fang, Xiaofei Li · 2026-06-18 04:00

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

arXiv:2509.14653v2 Announce Type: replace Abstract: This paper proposes a unimodal aggregation (UMA) based nonautoregressive model for both English and Mandarin speech recognition. The original UMA explicitly segments and aggregates acoustic frames (with unimodal weights that fir…

COVERAGE [1]

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

RELATED ENTITIES

RELATED TOPICS