New LM-SPT method enhances speech tokenization for better language model alignment

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

Researchers have developed LM-SPT, a novel method for speech tokenization that aims to improve the alignment between speech and language models. Unlike previous approaches that directly distill features or use pooling, LM-SPT uses a semantic speech-resynthesis distillation process. This indirect supervision method encourages the creation of dedicated semantic units that are more aligned with language models, even at reduced frame rates, and has shown superior performance in automatic speech recognition and text-to-speech tasks without sacrificing speech reconstruction fidelity. AI

RANK_REASON The cluster contains an academic paper detailing a new method for speech tokenization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Daejin Jo, Jeeyoung Yun, Byungseok Roh, Sungwoong Kim · 2026-06-16 04:00

LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization

arXiv:2506.16738v2 Announce Type: replace-cross Abstract: With the rapid progress of speech language models (SLMs), discrete speech tokens have emerged as a core interface between speech and text, enabling unified modeling across modalities. Recent speech tokenization approaches …

COVERAGE [1]

LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization

RELATED ENTITIES

RELATED TOPICS