WavLM advances vocal effort classification with data augmentation

By PulseAugur Editorial · [1 sources] · 2026-06-29 04:00

Researchers have advanced speaker-based vocal effort classification by utilizing the WavLM model, outperforming previous approaches like Wav2Vec2 and HuBERT. To combat data scarcity, they systematically studied various augmentation strategies, including RIR convolution, additive noise, time masking, speed perturbation, band-limiting, MixUp, and CutMix, which consistently improved WavLM performance. Further enhancements were achieved through Gaussian-neighbor soft labels, which model the vocal effort continuum to reduce confusion between adjacent categories. The best-performing system, WavLM-BASE with gradual unfreezing, augmentation, and soft labels, achieved a new state-of-the-art accuracy of 78.2% on the AVID corpus. AI

IMPACT Improves robustness of speech technologies by enhancing vocal effort classification.

RANK_REASON Academic paper detailing a new state-of-the-art result on a specific benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

WavLM advances vocal effort classification with data augmentation

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Zahra Omidi, John H. L. Hansen · 2026-06-29 04:00

Advancing Speaker-Based Vocal Effort Classification with WavLM and Data Augmentation in Naturalistic Non-Calibrated Speech Recordings

arXiv:2606.27543v1 Announce Type: cross Abstract: The variations in vocal effort range (e.g. whisper, soft, neutral, loud, shout) alter production and speech acoustics, reducing intelligibility and limiting the robustness of any subsequent speech technology. Classification is cha…

COVERAGE [1]

Advancing Speaker-Based Vocal Effort Classification with WavLM and Data Augmentation in Naturalistic Non-Calibrated Speech Recordings

RELATED ENTITIES

RELATED TOPICS