Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

Grounding Computer Use Agents on Human Demonstrations

Researchers have introduced GroundCUA, a large-scale dataset designed to improve computer-use agents by accurately connecting natural language instructions to on-screen elements in desktop environments. The dataset comprises 56,000 screenshots with over 3.56 million human-verified annotations across 87 applications. Utilizing this dataset, the GroundNext models, at 3B and 7B parameter scales, achieved state-of-the-art performance on five benchmarks with significantly less training data than previous methods. AI

IMPACT Enhances AI agent capabilities for desktop environments, potentially leading to more sophisticated automation tools.

GroundCUA
Aarash Feizi
GroundNext