Researchers have developed SPARE, a novel method for reducing the computational load of Vision Language Models (VLMs) by pruning visual tokens. Unlike previous diversity-maximizing strategies that ignore token magnitude, SPARE reformulates token reduction as a subspace reconstruction problem, minimizing reconstruction error. The method also incorporates an "anti-relevance" criterion, identifying tokens that, despite low image-text relevance, better preserve contextual information. Experiments show SPARE can remove up to 94% of visual tokens from models like LLaVA while maintaining 95% of baseline performance, all without requiring additional training. AI
IMPACT This method could significantly reduce the computational cost of deploying VLMs, making them more accessible and efficient for various applications.
RANK_REASON The cluster contains a research paper detailing a new method for optimizing VLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →