PulseAugur
EN
LIVE 21:18:47

ASR models advance with new architectures and vast supervised data

The field of Automatic Speech Recognition (ASR) is seeing rapid advancements driven by two primary factors: the increasing availability of pseudo-labeled data and the emergence of new model architectures. While models like Whisper-large-v3 and Nvidia Parakeet v3 demonstrate the power of large-scale supervised training, the discussion questions whether self-supervised learning approaches will be phased out for ASR tasks. This contrasts with computer vision, where self-supervised methods like Dinov3 are highly performant, prompting speculation about a similar breakthrough in speech processing. AI

IMPACT Discussion explores the potential shift from self-supervised to supervised learning in ASR, impacting future model development and research focus.

RANK_REASON This is a discussion thread on Reddit about the future direction of ASR models, not a primary release or research paper.

Read on r/MachineLearning →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/MachineLearning TIER_1 English(EN) · /u/ComprehensiveTop3297 ·

    What will be the next breakthrough in ASR? [D]

    <!-- SC_OFF --><div class="md"><p>Hey All,</p> <p>I am currently working on ASR models, and I have gathered some recent literature. From my literature search, it seems like the ASR models are getting more and more powerful due to two main things.</p> <ol> <li><p>Because pseudo-la…