PulseAugur
实时 12:33:12
English(EN) BAMI: Training-Free Bias Mitigation in GUI Grounding

新方法BAMI和AutoFocus改进了AI代理的图形用户界面基础

研究人员开发了两种新的无训练方法BAMI和AutoFocus,以提高AI代理图形用户界面基础的准确性。BAMI通过粗粒度到细粒度的聚焦和候选选择来解决精度和歧义偏差,将TianXi-Action-7B模型在ScreenSpot-Pro基准上的性能从51.9%提高到57.8%。AutoFocus通过采用不确定性感知的主动视觉搜索来解决高分辨率界面中的分辨率差距,使用token级困惑度来模拟空间不确定性,并改进了在ScreenSpot-Pro和ScreenSpot-V2等基准上各种VLMs的基础。 AI

影响 这些方法可以提高AI代理与图形用户界面交互的可靠性和精度,从而实现更复杂的任务自动化。

排序理由 该集群包含两篇arXiv论文,详细介绍了改进AI代理在图形用户界面基础任务中性能的新方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

新方法BAMI和AutoFocus改进了AI代理的图形用户界面基础

报道来源 [5]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    BAMI: Training-Free Bias Mitigation in GUI Grounding

    GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Predicti…

  2. arXiv cs.CV TIER_1 English(EN) · Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu ·

    BAMI: Training-Free Bias Mitigation in GUI Grounding

    arXiv:2605.06664v1 Announce Type: new Abstract: GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance…

  3. arXiv cs.CV TIER_1 English(EN) · Jiwen Lu ·

    BAMI: Training-Free Bias Mitigation in GUI Grounding

    GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Predicti…

  4. arXiv cs.CV TIER_1 English(EN) · Ruilin Yao, Shegnwu Xiong, Tianyu Zou, Shili Xiong, Yi Rong ·

    AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

    arXiv:2605.02630v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense lay…

  5. arXiv cs.CV TIER_1 English(EN) · Yi Rong ·

    AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

    Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense layouts and small interactive elements expose a res…