PulseAugur
EN
LIVE 11:28:58

New ASR Benchmarks and Training Methods Emerge for LLM Era

Researchers are developing new methods to improve automatic speech recognition (ASR) systems, particularly in specialized domains. One approach focuses on leveraging synthetic speech to train ASR models for regulated industries like banking and healthcare, addressing privacy concerns by reducing reliance on real, sensitive recordings. Another development introduces PreferenceASR, a new test set designed to evaluate ASR systems on their ability to adhere to user-defined output styles for numbers, disfluencies, entities, and casing, revealing performance differences not captured by traditional benchmarks. AI

IMPACT Advances in ASR training and evaluation could lead to more accurate and customizable speech recognition systems across various applications.

RANK_REASON Two academic papers introducing new methods and datasets for ASR systems.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New ASR Benchmarks and Training Methods Emerge for LLM Era

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Yanis Labrak, Dairazalia Sanchez-Cortes, Sergio Burdisso, S\'everin Baroudi, Shashi Kumar, Esa\'u Villatoro-Tello, Srikanth Madikeri, Manjunath K E, Old\v{r}ich Plchot, Kadri Hacio\u{g}lu, Petr Motlicek, Andreas Stolcke ·

    How to Leverage Synthetic Speech for LLM-Based ASR Systems?

    arXiv:2606.29031v1 Announce Type: cross Abstract: In regulated domains such as banking and healthcare, where privacy constraints make real speech costly to collect and retain, synthetic speech from modern text-to-speech (TTS) is an appealing alternative for training automatic spe…

  2. arXiv cs.CL TIER_1 English(EN) · Nithin Rao Koluguri, Sasha Meister, Nikolay Karpov, Piotr Zelasko, Desh Raj, Jagadeesh Balam, Boris Ginsburg ·

    Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs

    arXiv:2606.29534v1 Announce Type: new Abstract: Popular ASR test sets adopt inconsistent conventions for numbers, disfluencies, entities, and casing, while standard normalizers erase the format distinctions users care about. Current benchmarks therefore cannot measure whether a m…