PulseAugur
EN
LIVE 11:38:49

Neural audio codecs achieve smooth degradation down to 1.6 Hz

Researchers have investigated the degradation mechanisms in neural audio codecs operating at low frame rates, which are beneficial for autoregressive speech synthesis. Their study identified that a previously observed quality cliff at 6.25 Hz was not due to phonemic collisions or codebook saturation, but rather a suboptimal training configuration. By correcting this configuration, the word error rate degraded smoothly down to 1.6 Hz, indicating that the efficiency gains of low frame rate codecs are more attainable than previously thought. AI

IMPACT Improved efficiency in speech synthesis models by enabling lower frame rates.

RANK_REASON The cluster contains an academic paper detailing research findings on neural audio codecs.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Alex Gichamba, Moise Busogi ·

    Probing Low Frame Rate Degradation in Neural Audio Codecs

    arXiv:2606.16969v1 Announce Type: cross Abstract: Low frame rates in neural audio codecs are attractive for autoregressive speech synthesis, where the generation cost scales linearly with the sequence length. Recent work has demonstrated that codecs can operate at 12.5 Hz and bel…

  2. arXiv cs.AI TIER_1 English(EN) · Moise Busogi ·

    Probing Low Frame Rate Degradation in Neural Audio Codecs

    Low frame rates in neural audio codecs are attractive for autoregressive speech synthesis, where the generation cost scales linearly with the sequence length. Recent work has demonstrated that codecs can operate at 12.5 Hz and below, but the mechanisms underlying low frame rate d…