Researchers develop novel RL framework for precise GUI grounding

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a new framework called Propose-then-Critic to improve the accuracy of mapping natural language instructions to specific pixel locations within graphical user interfaces. This method uses a reinforcement learning paradigm that allows a 'proposer' module to generate potential targets and a 'critic' module to evaluate and select the best one. The two modules are trained to co-evolve, with the proposer's diverse outputs helping the critic become more robust, and the critic's improving judgment enabling the proposer to explore more options. Experiments across six benchmarks demonstrated significant improvements in both grounding accuracy and critic reliability. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON The submission is an academic paper detailing a novel method for GUI grounding.

Read on arXiv cs.CV →

paper
other

COVERAGE [1]

arXiv cs.CV TIER_1 · Shengyu Zhang · 2026-04-23 04:23

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Graphical User Interface (GUI) grounding requires mapping natural language instructions to precise pixel coordinates. However, due to visually homogeneous elements and dense layouts, models typically grasp semantic intent yet struggle with achieving precise localization. While sc…

COVERAGE [1]

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

RELATED TOPICS