Researchers have investigated the degradation mechanisms in neural audio codecs operating at low frame rates, which are beneficial for autoregressive speech synthesis. Their study identified that a previously observed quality cliff at 6.25 Hz was not due to phonemic collisions or codebook saturation, but rather a suboptimal training configuration. By correcting this configuration, the word error rate degraded smoothly down to 1.6 Hz, indicating that the efficiency gains of low frame rate codecs are more attainable than previously thought. AI
IMPACT Improved efficiency in speech synthesis models by enabling lower frame rates.
RANK_REASON The cluster contains an academic paper detailing research findings on neural audio codecs.
- decoder
- Neural Audio Codecs
- speech synthesis
- arXiv
- Autoregressive Speech Synthesis
- Codebook Saturation
- Phonemic Collisions
- word error rate
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →