Grounding Computer Use Agents on Human Demonstrations
Researchers have introduced GroundCUA, a large-scale dataset designed to improve computer-use agents by accurately connecting natural language instructions to on-screen elements in desktop environments. The dataset comprises 56,000 screenshots with over 3.56 million human-verified annotations across 87 applications. Utilizing this dataset, the GroundNext models, at 3B and 7B parameter scales, achieved state-of-the-art performance on five benchmarks with significantly less training data than previous methods. AI
IMPACT Enhances AI agent capabilities for desktop environments, potentially leading to more sophisticated automation tools.