Constrained Dominant Sets for Multimodal Document Question Answering
Researchers have developed a new retrieval method called Constrained Dominant Sets (CDS) for multimodal document question answering. This technique addresses limitations in current systems that struggle with long documents by selecting complementary evidence rather than near-duplicates. CDS encodes the query as a structural constraint, automatically balances relevance and redundancy, and avoids greedy heuristics by achieving global equilibrium. When used with a Qwen3-VL-32B reader, CDS sets a new state-of-the-art on VisDoMBench and significantly improves performance on MMLongBench-Doc. AI
IMPACT Establishes new SOTA on multimodal QA benchmarks, improving retrieval for long documents.