Audio-language models struggle with dysarthric speech context, but fine-tuning shows promise

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 4 sources

Researchers have developed a benchmark to test if current audio-language models can effectively use additional clinical context to improve automatic speech recognition for dysarthric speech. Initial findings indicate that these models do not significantly benefit from diagnosis labels or detailed clinical descriptions, with some prompts even degrading performance. However, fine-tuning with clinical context shows promise, achieving a substantial reduction in word error rate for specific subgroups like those with Down syndrome. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT Highlights limitations in current ASR models for atypical speech and offers a path toward more inclusive technologies.

RANK_REASON Academic paper presenting a new benchmark and fine-tuning method for ASR models.

Read on arXiv cs.LG →

paper
other

COVERAGE [4]

arXiv cs.CL TIER_1 · Pehu\'en Moure, Niclas Pokel, Bilal Bounajma, Yingqiang Gao, Roman Boehringer, Longbiao Cheng, Shih-Chii Liu · 2026-05-05 04:00

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

arXiv:2605.02782v1 Announce Type: cross Abstract: Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at infer…
arXiv cs.CL TIER_1 · Shih-Chii Liu · 2026-05-04 16:24

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models …
Hugging Face Daily Papers TIER_1 · 2026-05-04 16:24

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models …
arXiv cs.LG TIER_1 · Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson · 2026-05-04 04:00

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

arXiv:2603.15988v2 Announce Type: replace-cross Abstract: Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits r…

COVERAGE [4]

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

RELATED ENTITIES

RELATED TOPICS