Argus-Retriever: Vision-LLM Late-Interaction Retrieval with Region-Aware Query-Conditioned MoE for Visual Document Retrieval
Researchers have developed Argus, a novel retrieval system designed for visual documents. Unlike previous methods that generate static document embeddings, Argus creates query-conditioned representations using a region-aware Mixture-of-Experts module. This approach allows the system to adapt document representations based on the specific query, leading to improved performance on visual document retrieval tasks. The Argus-9B model achieved state-of-the-art results on the ViDoRe leaderboard, outperforming existing open late-interaction models. AI
IMPACT Advances visual document retrieval, potentially improving how LLM agents access and process information from complex visual documents.