Researchers have advanced speaker-based vocal effort classification by utilizing the WavLM model, outperforming previous approaches like Wav2Vec2 and HuBERT. To combat data scarcity, they systematically studied various augmentation strategies, including RIR convolution, additive noise, time masking, speed perturbation, band-limiting, MixUp, and CutMix, which consistently improved WavLM performance. Further enhancements were achieved through Gaussian-neighbor soft labels, which model the vocal effort continuum to reduce confusion between adjacent categories. The best-performing system, WavLM-BASE with gradual unfreezing, augmentation, and soft labels, achieved a new state-of-the-art accuracy of 78.2% on the AVID corpus. AI
IMPACT Improves robustness of speech technologies by enhancing vocal effort classification.
RANK_REASON Academic paper detailing a new state-of-the-art result on a specific benchmark. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →