New ASR training method boosts performance on large datasets

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers have developed a new method to improve automatic speech recognition (ASR) models by more effectively utilizing large-scale, weakly supervised datasets. Their approach involves a three-step process: initial pretraining on the full dataset, followed by continued pretraining on a filtered subset identified by character error rate, and finally, fine-tuning on a small selection of acoustically similar samples. Experiments with a 90,000-hour Japanese dataset demonstrated significant reductions in character error rate, with filtering and selection methods independently reducing CER by up to 6.4% and 4.0%, respectively. AI

IMPACT This research offers a method to enhance ASR model performance by optimizing the use of noisy, large-scale datasets, potentially leading to more accurate speech recognition systems.

RANK_REASON The cluster contains a single academic paper detailing a novel method for improving ASR models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

Japanese

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New ASR training method boosts performance on large datasets

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Kohei Matsuura, Masato Mimura · 2026-06-30 04:00

Improving Large-Scale Weakly Supervised ASR by Filtering and Selection

arXiv:2606.28728v1 Announce Type: cross Abstract: Leveraging large-scale weakly supervised datasets is crucial to train robust end-to-end automatic speech recognition (ASR) models. However, such datasets often contain noisy labels and lack domain specificity, limiting their effec…

COVERAGE [1]

Improving Large-Scale Weakly Supervised ASR by Filtering and Selection

RELATED ENTITIES

RELATED TOPICS