PulseAugur
实时 20:03:36
English(EN) Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Holo1:驱动 GUI 代理 Surfer-H 的新型 GUI 自动化 VLM 系列

研究人员推出 A11y-Compressor 框架,通过将线性化的可访问性树转换为结构化表示,旨在提高 GUI 代理观察的效率。该方法显著减少了输入 token,同时提高了任务成功率。同时,开发了一个名为 WindowsWorld 的新基准,用于评估 GUI 代理在复杂、多应用程序专业工作流上的表现,揭示了当前代理在此类场景中的糟糕表现。此外,VLAA-GUI 提供了一个模块化框架,以解决自主 GUI 代理中的早期停止和重复循环等挑战,并包含验证、循环中断和在线搜索组件。 AI

影响 新的基准和框架正在涌现,以推动 GUI 代理在复杂、现实场景中的能力。

排序理由 多篇 arXiv 论文介绍了用于 GUI 代理的新框架、基准和方法。

在 Hugging Face Blog 阅读 →

AI 生成摘要 · Google Gemini · 来自 17 个来源。 我们如何撰写摘要 →

Holo1:驱动 GUI 代理 Surfer-H 的新型 GUI 自动化 VLM 系列

报道来源 [17]

  1. Hugging Face Blog TIER_1 English(EN) ·

    Holo1:驱动GUI代理Surfer-H的新型GUI自动化VLM系列

  2. arXiv cs.CL TIER_1 English(EN) · Michito Takeshita, Takuro Kawada, Takumi Ohashi, Shunsuke Kitada, Hitoshi Iyatomi ·

    A11y-Compressor:通过视觉上下文重建和冗余减少来增强 GUI Agent 观测效率的框架

    arXiv:2605.00551v1 Announce Type: new Abstract: AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding. The accessibility tree is a commonly used text-based format that encodes UI element attributes, but …

  3. arXiv cs.CL TIER_1 English(EN) · Hitoshi Iyatomi ·

    A11y-Compressor:通过视觉上下文重构和冗余减少来增强 GUI Agent 观测效率的框架

    AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding. The accessibility tree is a commonly used text-based format that encodes UI element attributes, but it suffers from redundancy and lacks structural …

  4. arXiv cs.AI TIER_1 English(EN) · Jinchao Li, Yunxin Li, Chenrui Zhao, Zhenran Xu, Baotian Hu, Min Zhang ·

    WindowsWorld:专业跨应用环境中自主GUI代理的以流程为中心的基准测试

    arXiv:2604.27776v1 Announce Type: new Abstract: While GUI agents have shown impressive capabilities in common computer-use tasks such as OSWorld, current benchmarks mainly focus on isolated and single-application tasks. This overlooks a critical real-world requirement of coordina…

  5. arXiv cs.CL TIER_1 English(EN) · Min Zhang ·

    WindowsWorld:专业跨应用环境下的自主GUI代理的以流程为中心的基准测试

    While GUI agents have shown impressive capabilities in common computer-use tasks such as OSWorld, current benchmarks mainly focus on isolated and single-application tasks. This overlooks a critical real-world requirement of coordinating across multiple applications to accomplish …

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    在高动态环境中对 GUI Agent 进行基准测试和改进

    Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplored. Existing agents …

  7. arXiv cs.CL TIER_1 English(EN) · Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie ·

    VLAA-GUI:何时停止、恢复和搜索,一个用于GUI自动化的模块化框架

    arXiv:2604.21375v2 Announce Type: replace Abstract: Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recov…

  8. arXiv cs.CL TIER_1 English(EN) · Cihang Xie ·

    VLAA-GUI:何时停止、恢复和搜索,一个用于 GUI 自动化的模块化框架

    Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic fram…

  9. Hugging Face Daily Papers TIER_1 English(EN) ·

    VLAA-GUI:何时停止、恢复和搜索,一个用于 GUI 自动化的模块化框架

    Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic fram…

  10. arXiv cs.CV TIER_1 English(EN) · Yan Zhang, Daiqing Wu, Huawen Shen, Yu Zhou, Can Ma ·

    向自身学习:GUI基础的策略内自蒸馏

    arXiv:2605.00642v1 Announce Type: cross Abstract: Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO)…

  11. arXiv cs.CV TIER_1 English(EN) · Can Ma ·

    向自身学习:GUI基础的策略内自蒸馏

    Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely o…

  12. arXiv cs.CV TIER_1 English(EN) · Fengxian Ji, Jingpu Yang, Zirui Song, Yuanxi Wang, Zhexuan Cui, Yuke Li, Qian Jiang, Xiuying Chen ·

    FineState-Bench:用于细粒度 GUI 状态设置的条件状态接地基准测试

    arXiv:2604.27974v1 Announce Type: new Abstract: Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreli…

  13. arXiv cs.CV TIER_1 English(EN) · Xiuying Chen ·

    FineState-Bench:用于细粒度 GUI 状态设置的条件状态接地基准测试

    Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreliance on final-task success, obscuring where and …

  14. arXiv cs.CV TIER_1 English(EN) · Enqi Liu, Liyuan Pan, Zhi Gao, Yan Yang, Chenrui Shi, Yang Liu, Jingrong Wu, Qing Li ·

    在高动态环境中对 GUI Agent 进行基准测试和改进

    arXiv:2604.25380v1 Announce Type: new Abstract: Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments…

  15. arXiv cs.CV TIER_1 English(EN) · Qing Li ·

    高动态环境中GUI智能体的基准测试与改进

    Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplored. Existing agents …

  16. arXiv cs.CV TIER_1 English(EN) · Hongxin Li, Xiping Wang, Jingran Su, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang ·

    AutoGUI-v2:一个全面的多模态 GUI 功能理解基准

    arXiv:2604.24441v1 Announce Type: new Abstract: Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a …

  17. arXiv cs.CV TIER_1 English(EN) · Zhaoxiang Zhang ·

    AutoGUI-v2:一个全面的多模态 GUI 功能理解基准

    Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a predictive mental model of interface dynamics an…