English(EN) BAMI: Training-Free Bias Mitigation in GUI Grounding

新方法BAMI和AutoFocus改进了AI代理的图形用户界面基础

作者 PulseAugur 编辑部 · [5 个来源] · 2026-05-04 14:18

研究人员开发了两种新的无训练方法BAMI和AutoFocus，以提高AI代理图形用户界面基础的准确性。BAMI通过粗粒度到细粒度的聚焦和候选选择来解决精度和歧义偏差，将TianXi-Action-7B模型在ScreenSpot-Pro基准上的性能从51.9%提高到57.8%。AutoFocus通过采用不确定性感知的主动视觉搜索来解决高分辨率界面中的分辨率差距，使用token级困惑度来模拟空间不确定性，并改进了在ScreenSpot-Pro和ScreenSpot-V2等基准上各种VLMs的基础。 AI

影响这些方法可以提高AI代理与图形用户界面交互的可靠性和精度，从而实现更复杂的任务自动化。

排序理由该集群包含两篇arXiv论文，详细介绍了改进AI代理在图形用户界面基础任务中性能的新方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-07 17:59

BAMI: Training-Free Bias Mitigation in GUI Grounding

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Predicti…
arXiv cs.CV TIER_1 English(EN) · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu · 2026-05-08 04:00

BAMI: Training-Free Bias Mitigation in GUI Grounding

arXiv:2605.06664v1 Announce Type: new Abstract: GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance…
arXiv cs.CV TIER_1 English(EN) · Jiwen Lu · 2026-05-07 17:59

BAMI: Training-Free Bias Mitigation in GUI Grounding

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Predicti…
arXiv cs.CV TIER_1 English(EN) · Ruilin Yao, Shegnwu Xiong, Tianyu Zou, Shili Xiong, Yi Rong · 2026-05-05 04:00

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

arXiv:2605.02630v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense lay…
arXiv cs.CV TIER_1 English(EN) · Yi Rong · 2026-05-04 14:18

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense layouts and small interactive elements expose a res…

报道来源 [5]

BAMI: Training-Free Bias Mitigation in GUI Grounding

BAMI: Training-Free Bias Mitigation in GUI Grounding

BAMI: Training-Free Bias Mitigation in GUI Grounding

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

相关实体

相关话题