Audio-language models struggle with dysarthric speech context, but fine-tuning shows promise

作者 PulseAugur 编辑部 · [4 个来源] · 2026-05-04 04:00

Researchers have developed a benchmark to test if current audio-language models can effectively use additional clinical context to improve automatic speech recognition for dysarthric speech. Initial findings indicate that these models do not significantly benefit from diagnosis labels or detailed clinical descriptions, with some prompts even degrading performance. However, fine-tuning with clinical context shows promise, achieving a substantial reduction in word error rate for specific subgroups like those with Down syndrome. AI

影响 Highlights limitations in current ASR models for atypical speech and offers a path toward more inclusive technologies.

排序理由 Academic paper presenting a new benchmark and fine-tuning method for ASR models.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.CL TIER_1 English(EN) · Pehu\'en Moure, Niclas Pokel, Bilal Bounajma, Yingqiang Gao, Roman Boehringer, Longbiao Cheng, Shih-Chii Liu · 2026-05-05 04:00

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

arXiv:2605.02782v1 Announce Type: cross Abstract: Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at infer…
arXiv cs.CL TIER_1 English(EN) · Shih-Chii Liu · 2026-05-04 16:24

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-04 16:24

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models …
arXiv cs.LG TIER_1 English(EN) · Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson · 2026-05-04 04:00

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

arXiv:2603.15988v2 Announce Type: replace-cross Abstract: Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits r…

报道来源 [4]

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

相关实体

相关话题