IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents
Researchers have developed a new reward model called IntentScore to improve the reliability of computer-use agents (CUAs) that automate desktop tasks. CUAs often make irreversible errors because they lack a mechanism to evaluate the quality of their actions. IntentScore addresses this by learning to score candidate actions based on their relevance and correctness, achieving 97.5% accuracy in pairwise discrimination. When deployed on the OSWorld environment, IntentScore boosted task success rates by 6.9 points, demonstrating its effectiveness in unseen scenarios. AI
IMPACT Enhances the reliability and success rate of AI agents performing desktop tasks, reducing costly errors.