Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 19h · [2 sources]

Probing Low Frame Rate Degradation in Neural Audio Codecs

Researchers have investigated the degradation mechanisms in neural audio codecs operating at low frame rates, which are beneficial for autoregressive speech synthesis. Their study identified that a previously observed quality cliff at 6.25 Hz was not due to phonemic collisions or codebook saturation, but rather a suboptimal training configuration. By correcting this configuration, the word error rate degraded smoothly down to 1.6 Hz, indicating that the efficiency gains of low frame rate codecs are more attainable than previously thought. AI

IMPACT Improved efficiency in speech synthesis models by enabling lower frame rates.

speech synthesis
decoder
Neural Audio Codecs
Codebook Saturation
arXiv
Autoregressive Speech Synthesis
word error rate
Phonemic Collisions