WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers
Researchers have developed WhisTLE, a novel method for adapting pre-trained automatic speech recognition (ASR) models using only text data. This technique employs a variational autoencoder to model encoder outputs and fine-tunes the decoder, optionally incorporating text-to-speech synthesis. WhisTLE significantly reduces word error rates, outperforming other adaptation methods in most tested scenarios without adding runtime costs. AI
IMPACT Offers a more efficient way to adapt ASR models to specific domains using only text, potentially improving accuracy in specialized applications.