Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

Whisfusion: Parallel ASR Decoding with Masked Diffusion

Researchers have developed Whisfusion, a novel non-autoregressive system for automatic speech recognition (ASR) that utilizes masked diffusion models. This approach aims to match the accuracy of traditional autoregressive models while significantly improving inference speed. Whisfusion achieves this by training a diffusion decoder on top of frozen Whisper-large-v3 audio embeddings, enabling parallel decoding and outperforming existing models in both speed and accuracy across multiple languages. AI

IMPACT Establishes masked diffusion as a viable, high-throughput alternative for multilingual ASR, potentially accelerating real-time transcription applications.

Canary
Qwen3-ASR
Whisper-large-v3
Whisfusion
Taeyoun Kwon