Researchers have introduced Glance-or-Gaze (GoG), a new framework designed to improve Large Multimodal Models (LMMs) in handling knowledge-intensive visual queries. Unlike previous methods that retrieve information indiscriminately, GoG employs a Selective Gaze mechanism to adaptively focus on relevant image regions or global context. The framework is trained using a dual-stage approach, combining supervised fine-tuning with complexity-adaptive reinforcement learning to enhance iterative reasoning and performance on complex visual tasks. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a novel adaptive search mechanism for LMMs, potentially improving efficiency and accuracy in complex visual query tasks.
RANK_REASON This is a research paper detailing a new framework for LMMs.