New methods BAMI and AutoFocus improve GUI grounding for AI agents

By PulseAugur Editorial · [5 sources] · 2026-05-04 14:18

Researchers have developed two new training-free methods, BAMI and AutoFocus, to improve the accuracy of GUI grounding for AI agents. BAMI addresses precision and ambiguity biases by using coarse-to-fine focus and candidate selection, boosting the TianXi-Action-7B model's performance on the ScreenSpot-Pro benchmark from 51.9% to 57.8%. AutoFocus tackles resolution gaps in high-resolution interfaces by employing uncertainty-aware active visual search, using token-level perplexity to model spatial uncertainty and improve grounding across various VLMs on benchmarks like ScreenSpot-Pro and ScreenSpot-V2. AI

IMPACT These methods could enhance the reliability and precision of AI agents interacting with graphical user interfaces, enabling more complex task automation.

RANK_REASON The cluster contains two arXiv papers detailing novel methods for improving AI agent performance in GUI grounding tasks.

Read on Hugging Face Daily Papers →

paper
other

AI-generated summary · Google Gemini · from 5 sources. How we write summaries →

New methods BAMI and AutoFocus improve GUI grounding for AI agents

COVERAGE [5]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-07 17:59

BAMI: Training-Free Bias Mitigation in GUI Grounding

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Predicti…
arXiv cs.CV TIER_1 English(EN) · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu · 2026-05-08 04:00

BAMI: Training-Free Bias Mitigation in GUI Grounding

arXiv:2605.06664v1 Announce Type: new Abstract: GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance…
arXiv cs.CV TIER_1 English(EN) · Jiwen Lu · 2026-05-07 17:59

BAMI: Training-Free Bias Mitigation in GUI Grounding

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Predicti…
arXiv cs.CV TIER_1 English(EN) · Ruilin Yao, Shegnwu Xiong, Tianyu Zou, Shili Xiong, Yi Rong · 2026-05-05 04:00

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

arXiv:2605.02630v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense lay…
arXiv cs.CV TIER_1 English(EN) · Yi Rong · 2026-05-04 14:18

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense layouts and small interactive elements expose a res…

COVERAGE [5]

BAMI: Training-Free Bias Mitigation in GUI Grounding

BAMI: Training-Free Bias Mitigation in GUI Grounding

BAMI: Training-Free Bias Mitigation in GUI Grounding

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

RELATED TOPICS