miniReranker: Efficient Multimodal Reranking through Visual Cache Reuse and Interaction Sparsity
Researchers have developed miniReranker, a novel approach to improve the efficiency of multimodal large language models (MLLMs) when used as rerankers. The system reconfigures the standard query-first formulation to a vision-first approach, enhancing cache reuse and reranking performance. MiniReranker further optimizes by reducing active parameters through early exits, limiting cross-segment attention, and pruning visual tokens, achieving over 96% of dense model performance while reducing runtime to less than 1% in high-reuse scenarios. AI
IMPACT Enhances efficiency for multimodal AI systems, potentially accelerating search and recommendation applications.