English(EN) X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

新的 X-OPD 框架使语音大语言模型与文本模型对齐

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-15 04:00

研究人员开发了 X-OPD，一个旨在提升基于语音的大语言模型（LLMs）能力的新框架。该方法解决了端到端语音大语言模型与其文本对应模型之间常出现的性能差距，而标准训练技术无法弥合这一差距。X-OPD 使用一个基于文本的教师模型，对其语音大语言模型的探索提供反馈，有效地将教师的知识蒸馏到学生模型的跨模态表示中。实验表明，X-OPD 在复杂任务上显著减小了这一性能差距，同时保留了语音大语言模型固有的能力。 AI

影响该框架可能带来更强大、更对齐的基于语音的 AI 系统，缩小与纯文本模型的性能差距。

排序理由该集群包含一篇详细介绍语音大语言模型新框架的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Di Cao, Dongjie Fu, Hai Yu, Siqi Zheng, Xu Tan, Tao Jin · 2026-06-15 04:00

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

arXiv:2603.24596v3 Announce Type: replace-cross Abstract: While the shift from cascaded dialogue systems to end-to-end (E2E) speech Large Language Models (LLMs) improves latency and paralinguistic modeling, E2E models often exhibit a significant performance degradation compared t…

报道来源 [1]

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

相关实体

相关话题