Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 2w

Locality Matters for Training-Free Audio Token Compression in Audio-Language Models

Researchers have developed a new method called Local Temporal Bipartite Merging (LTBM) to compress audio tokens in audio-language models. This training-free approach merges similar nearby audio tokens within a temporal window, aiming to reduce inference costs and memory usage. Experiments suggest that this locality-aware merging is particularly beneficial for audio captioning tasks, especially at higher compression rates, while global matching performs better for audio understanding tasks. AI

IMPACT This compression technique could enable more efficient deployment of audio-language models in resource-constrained environments.

Clotho
Qwen2-Audio
Audio Flamingo 3
Local Temporal Bipartite Merging
AudioCaps