LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization
Researchers have developed LM-SPT, a novel method for speech tokenization that aims to improve the alignment between speech and language models. Unlike previous approaches that directly distill features or use pooling, LM-SPT uses a semantic speech-resynthesis distillation process. This indirect supervision method encourages the creation of dedicated semantic units that are more aligned with language models, even at reduced frame rates, and has shown superior performance in automatic speech recognition and text-to-speech tasks without sacrificing speech reconstruction fidelity. AI