Probing Low Frame Rate Degradation in Neural Audio Codecs
Researchers have investigated the degradation mechanisms in neural audio codecs operating at low frame rates, which are beneficial for autoregressive speech synthesis. Their study identified that a previously observed quality cliff at 6.25 Hz was not due to phonemic collisions or codebook saturation, but rather a suboptimal training configuration. By correcting this configuration, the word error rate degraded smoothly down to 1.6 Hz, indicating that the efficiency gains of low frame rate codecs are more attainable than previously thought. AI
IMPACT Improved efficiency in speech synthesis models by enabling lower frame rates.