Pretrained self-supervised speech models can recognize unseen consonants
Researchers investigated whether self-supervised speech models can accurately recognize uncommon speech sounds, specifically click consonants found in Khoisan languages. By fine-tuning models like Wav2Vec2 and HuBERT on data from G|ui and West !Xoon, they found that these models could indeed recognize clicks more effectively than non-click sounds. This suggests that self-supervised learning allows these models to generalize across a wider range of human phonemes, even those rarely encountered in typical training data. AI
IMPACT Demonstrates self-supervised models can generalize to rare phonemes, potentially improving low-resource language ASR.