Researchers have explored zero-shot voice cloning as a method to augment datasets for automatic speech recognition (ASR) systems trained on dysarthric speech. By cloning speakers from the TORGO dataset using Higgs Audio V2, they were able to fine-tune the Whisper-medium model. This approach achieved a Word Error Rate (WER) of 26.00%, which is competitive with models trained on real or hybrid data, and notably outperformed real data training for speakers with moderate to severe dysarthria. The findings suggest that zero-shot cloning offers a scalable solution to the data scarcity problem in dysarthric ASR. AI
IMPACT This research offers a scalable method to improve ASR for dysarthric speech, potentially increasing accessibility and usability of voice-enabled technologies for individuals with speech impairments.
RANK_REASON The cluster contains an academic paper detailing a new research methodology for improving ASR models.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →