PulseAugur
EN
LIVE 11:37:15

New text-only method adapts speech recognition models

Researchers have developed WhisTLE, a novel method for adapting pre-trained automatic speech recognition (ASR) models using only text data. This technique employs a variational autoencoder to model encoder outputs and fine-tunes the decoder, optionally incorporating text-to-speech synthesis. WhisTLE significantly reduces word error rates, outperforming other adaptation methods in most tested scenarios without adding runtime costs. AI

IMPACT Offers a more efficient way to adapt ASR models to specific domains using only text, potentially improving accuracy in specialized applications.

RANK_REASON Academic paper detailing a new method for domain adaptation of ASR models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Akshat Pandey, Karun Kumar, Raphael Tang ·

    WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers

    arXiv:2509.10452v2 Announce Type: replace Abstract: Pretrained automatic speech recognition (ASR) models such as Whisper perform well but still need domain adaptation to handle unseen parlance. In many real-world settings, collecting speech data is impractical, necessitating text…