AdaCodec: A Predictive Visual Code for Video MLLMs
Researchers have developed AdaCodec, a novel method for processing video in multimodal large language models (MLLMs). AdaCodec addresses the temporal redundancy in videos by transmitting a full frame only when scene changes are significant, otherwise encoding only the inter-frame differences. This approach significantly reduces the visual token budget and improves processing speed, outperforming existing methods on multiple benchmarks. AI
IMPACT Reduces computational cost and latency for video MLLMs, enabling more efficient processing of long video content.