Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

Researchers have developed WavSLM, a novel speech language model that simplifies the generation of coherent speech by distilling self-supervised WavLM representations into a single codebook. This approach allows WavSLM to jointly model semantic and acoustic information within a single token stream, bypassing the need for text supervision or pretraining. Despite its streamlined architecture, WavSLM demonstrates competitive performance on speech generation and consistency benchmarks, utilizing fewer parameters and less training data while enabling streaming inference. AI

Hugging Face
arXiv
DagsHub
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
WavLM
IArxiv
WavSLM
Luca Della Libera