PulseAugur
EN
LIVE 10:39:07

New Vietnamese ASR uses phoneme-based syllabic modeling

Researchers have developed a novel Syllabic-Structure Decoder for Automatic Speech Recognition (ASR) systems specifically for Vietnamese. This new approach models speech at the phoneme level, explicitly capturing the phonological composition of syllables rather than relying on orthographic units like characters or words. The system demonstrated superior performance on two Vietnamese speech benchmarks, LSVSC and UIT-ViMD, outperforming strong baselines like PhoWhisper and Wav2Vec2, despite utilizing a significantly smaller vocabulary and no additional training resources. AI

RANK_REASON The cluster contains an academic paper detailing a new model for speech recognition. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Vietnamese ASR uses phoneme-based syllabic modeling

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Nghia Hieu Nguyen, Quan Ngoc Hoang, Long Hoang Huu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen ·

    Syllabic-Structure Decoder for Automatic Speech Recognition in Vietnamese

    arXiv:2605.27874v1 Announce Type: new Abstract: Most Automatic Speech Recognition (ASR) systems formulate transcription as a prediction problem over orthographic units such as characters, subwords, or words. Although effective, such representations do not explicitly reflect the p…