Researchers have introduced HeadRouter, a novel method for compressing large audio language models by dynamically pruning audio tokens. Unlike previous approaches that assume uniform head importance, HeadRouter recognizes that different attention heads in these models have varying contributions depending on the audio task. This training-free technique identifies and leverages the importance of specific attention heads to retain critical tokens, leading to significant compression without sacrificing performance. Experiments show HeadRouter achieves state-of-the-art compression, even outperforming the original models on benchmarks like AudioMarathon and MMAU-Pro when retaining a substantial portion of tokens. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a training-free method to significantly reduce inference costs for large audio language models by optimizing token pruning.
RANK_REASON This is a research paper introducing a new method for audio token pruning in large audio language models.