Locality Matters for Training-Free Audio Token Compression in Audio-Language Models
Researchers have developed a new method called Local Temporal Bipartite Merging (LTBM) to compress audio tokens in audio-language models. This training-free approach merges similar nearby audio tokens within a temporal window, aiming to reduce inference costs and memory usage. Experiments suggest that this locality-aware merging is particularly beneficial for audio captioning tasks, especially at higher compression rates, while global matching performs better for audio understanding tasks. AI
IMPACT This compression technique could enable more efficient deployment of audio-language models in resource-constrained environments.