WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation
Researchers have developed WavSLM, a novel speech language model that simplifies the generation of coherent speech by distilling self-supervised WavLM representations into a single codebook. This approach allows WavSLM to jointly model semantic and acoustic information within a single token stream, bypassing the need for text supervision or pretraining. Despite its streamlined architecture, WavSLM demonstrates competitive performance on speech generation and consistency benchmarks, utilizing fewer parameters and less training data while enabling streaming inference. AI