New Vietnamese ASR uses phoneme-based syllabic modeling

By PulseAugur Editorial · [1 sources] · 2026-05-28 04:00

Researchers have developed a novel Syllabic-Structure Decoder for Automatic Speech Recognition (ASR) systems specifically for Vietnamese. This new approach models speech at the phoneme level, explicitly capturing the phonological composition of syllables rather than relying on orthographic units like characters or words. The system demonstrated superior performance on two Vietnamese speech benchmarks, LSVSC and UIT-ViMD, outperforming strong baselines like PhoWhisper and Wav2Vec2, despite utilizing a significantly smaller vocabulary and no additional training resources. AI

RANK_REASON The cluster contains an academic paper detailing a new model for speech recognition. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Vietnamese ASR uses phoneme-based syllabic modeling

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Nghia Hieu Nguyen, Quan Ngoc Hoang, Long Hoang Huu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen · 2026-05-28 04:00

Syllabic-Structure Decoder for Automatic Speech Recognition in Vietnamese

arXiv:2605.27874v1 Announce Type: new Abstract: Most Automatic Speech Recognition (ASR) systems formulate transcription as a prediction problem over orthographic units such as characters, subwords, or words. Although effective, such representations do not explicitly reflect the p…

COVERAGE [1]

Syllabic-Structure Decoder for Automatic Speech Recognition in Vietnamese

RELATED ENTITIES

RELATED TOPICS