This paper introduces novel methods for enhancing speech recognition models by leveraging text-only data. The research focuses on encoder-dominated architectures, demonstrating that a larger encoder paired with a smaller decoder can achieve performance comparable to or better than models with larger decoders. The study found that simpler configurations, like random duration models, often outperform more complex approaches, thereby streamlining the training process. All associated code and experimental setups are publicly released. AI
Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →
IMPACT Presents a simplified training pipeline for speech recognition models, potentially lowering barriers to entry for researchers and developers.
RANK_REASON Academic paper detailing new methods for speech recognition models.