Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 1d · [4 sources]

InfoMerge: Information-aware Token Compression for Efficient Video Large Language Models

Researchers have developed three new methods to significantly compress the visual tokens used by large vision-language models (VLMs), aiming to reduce computational overhead and improve inference speed. InfoMerge uses temporal fingerprint differences and content-aware allocation, ETC employs task-aware visual information distillation, and EvoCut analyzes multi-layer token evolution. These approaches demonstrate substantial reductions in token count, with some retaining over 98% of original performance while achieving significant speedups. AI

IMPACT These techniques offer significant efficiency gains for VLMs, potentially accelerating deployment and reducing operational costs for AI applications involving visual understanding.

LLaVA-1.5-7B
Qwen3-VL-2B
EvoCut
InfoMerge
LLaVA-OneVision-7B