Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 11h

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

Researchers have developed X-OPD, a new framework to improve the capabilities of speech-based Large Language Models (LLMs). This method addresses the performance gap often seen between end-to-end speech LLMs and their text-based counterparts, which standard training techniques fail to close. X-OPD uses a text-based teacher model to provide feedback on the speech LLM's explorations, effectively distilling the teacher's knowledge into the student model's multi-modal representations. Experiments show X-OPD significantly reduces this performance gap on complex tasks while retaining the speech LLM's inherent abilities. AI

IMPACT This framework could lead to more capable and aligned speech-based AI systems, reducing the performance disparity with text-only models.

LLMs
Di Cao
X-OPD