Brief · PulseAugur

TOOL · arXiv cs.IR (Information Retrieval) English(EN) · 20h

miniReranker: Efficient Multimodal Reranking through Visual Cache Reuse and Interaction Sparsity

Researchers have developed miniReranker, a novel approach to improve the efficiency of multimodal large language models (MLLMs) when used as rerankers. The system reconfigures the standard query-first formulation to a vision-first approach, enhancing cache reuse and reranking performance. MiniReranker further optimizes by reducing active parameters through early exits, limiting cross-segment attention, and pruning visual tokens, achieving over 96% of dense model performance while reducing runtime to less than 1% in high-reuse scenarios. AI

IMPACT Enhances efficiency for multimodal AI systems, potentially accelerating search and recommendation applications.

arXiv
multimodal large language models
miniReranker