EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs
Researchers have developed EchoDistill, a novel self-distillation framework designed to enhance the robustness of Audio Large Language Models (ALLMs) against real-world noise. This method aligns noisy student models with clean audio references from a teacher model, using policy optimization to guide the student's responses. Experiments show EchoDistill significantly improves semantic reliability and task performance under noisy conditions, with notable gains in metrics like GSR and accuracy. AI
IMPACT Enhances the reliability of audio-based AI models in real-world, noisy environments, potentially improving user experience and task completion.