English(EN) Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Holo1：驱动 GUI 代理 Surfer-H 的新型 GUI 自动化 VLM 系列

作者 PulseAugur 编辑部 · [17 个来源] · 2025-06-03 13:27

研究人员推出 A11y-Compressor 框架，通过将线性化的可访问性树转换为结构化表示，旨在提高 GUI 代理观察的效率。该方法显著减少了输入 token，同时提高了任务成功率。同时，开发了一个名为 WindowsWorld 的新基准，用于评估 GUI 代理在复杂、多应用程序专业工作流上的表现，揭示了当前代理在此类场景中的糟糕表现。此外，VLAA-GUI 提供了一个模块化框架，以解决自主 GUI 代理中的早期停止和重复循环等挑战，并包含验证、循环中断和在线搜索组件。 AI

影响新的基准和框架正在涌现，以推动 GUI 代理在复杂、现实场景中的能力。

排序理由多篇 arXiv 论文介绍了用于 GUI 代理的新框架、基准和方法。

在 Hugging Face Blog 阅读 →

AI 生成摘要 · Google Gemini · 来自 17 个来源。我们如何撰写摘要 →

Holo1：驱动 GUI 代理 Surfer-H 的新型 GUI 自动化 VLM 系列

报道来源 [17]

Hugging Face Blog TIER_1 English(EN) · 2025-06-03 13:27

Holo1：驱动GUI代理Surfer-H的新型GUI自动化VLM系列
arXiv cs.CL TIER_1 English(EN) · Michito Takeshita, Takuro Kawada, Takumi Ohashi, Shunsuke Kitada, Hitoshi Iyatomi · 2026-05-04 04:00

A11y-Compressor：通过视觉上下文重建和冗余减少来增强 GUI Agent 观测效率的框架

arXiv:2605.00551v1 Announce Type: new Abstract: AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding. The accessibility tree is a commonly used text-based format that encodes UI element attributes, but …
arXiv cs.CL TIER_1 English(EN) · Hitoshi Iyatomi · 2026-05-01 10:16

A11y-Compressor：通过视觉上下文重构和冗余减少来增强 GUI Agent 观测效率的框架

AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding. The accessibility tree is a commonly used text-based format that encodes UI element attributes, but it suffers from redundancy and lacks structural …
arXiv cs.AI TIER_1 English(EN) · Jinchao Li, Yunxin Li, Chenrui Zhao, Zhenran Xu, Baotian Hu, Min Zhang · 2026-05-01 04:00

WindowsWorld：专业跨应用环境中自主GUI代理的以流程为中心的基准测试

arXiv:2604.27776v1 Announce Type: new Abstract: While GUI agents have shown impressive capabilities in common computer-use tasks such as OSWorld, current benchmarks mainly focus on isolated and single-application tasks. This overlooks a critical real-world requirement of coordina…
arXiv cs.CL TIER_1 English(EN) · Min Zhang · 2026-04-30 12:13

WindowsWorld：专业跨应用环境下的自主GUI代理的以流程为中心的基准测试

While GUI agents have shown impressive capabilities in common computer-use tasks such as OSWorld, current benchmarks mainly focus on isolated and single-application tasks. This overlooks a critical real-world requirement of coordinating across multiple applications to accomplish …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-28 08:43

在高动态环境中对 GUI Agent 进行基准测试和改进

Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplored. Existing agents …
arXiv cs.CL TIER_1 English(EN) · Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie · 2026-04-27 04:00

VLAA-GUI：何时停止、恢复和搜索，一个用于GUI自动化的模块化框架

arXiv:2604.21375v2 Announce Type: replace Abstract: Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recov…
arXiv cs.CL TIER_1 English(EN) · Cihang Xie · 2026-04-23 07:42

VLAA-GUI：何时停止、恢复和搜索，一个用于 GUI 自动化的模块化框架

Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic fram…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-23 07:42

VLAA-GUI：何时停止、恢复和搜索，一个用于 GUI 自动化的模块化框架

Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic fram…
arXiv cs.CV TIER_1 English(EN) · Yan Zhang, Daiqing Wu, Huawen Shen, Yu Zhou, Can Ma · 2026-05-04 04:00

向自身学习：GUI基础的策略内自蒸馏

arXiv:2605.00642v1 Announce Type: cross Abstract: Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO)…
arXiv cs.CV TIER_1 English(EN) · Can Ma · 2026-05-01 13:23

向自身学习：GUI基础的策略内自蒸馏

Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely o…
arXiv cs.CV TIER_1 English(EN) · Fengxian Ji, Jingpu Yang, Zirui Song, Yuanxi Wang, Zhexuan Cui, Yuke Li, Qian Jiang, Xiuying Chen · 2026-05-01 04:00

FineState-Bench：用于细粒度 GUI 状态设置的条件状态接地基准测试

arXiv:2604.27974v1 Announce Type: new Abstract: Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreli…
arXiv cs.CV TIER_1 English(EN) · Xiuying Chen · 2026-04-30 15:03

FineState-Bench：用于细粒度 GUI 状态设置的条件状态接地基准测试

Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreliance on final-task success, obscuring where and …
arXiv cs.CV TIER_1 English(EN) · Enqi Liu, Liyuan Pan, Zhi Gao, Yan Yang, Chenrui Shi, Yang Liu, Jingrong Wu, Qing Li · 2026-04-29 04:00

在高动态环境中对 GUI Agent 进行基准测试和改进

arXiv:2604.25380v1 Announce Type: new Abstract: Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments…
arXiv cs.CV TIER_1 English(EN) · Qing Li · 2026-04-28 08:43

高动态环境中GUI智能体的基准测试与改进

Recent advancements in Graphical User Interface (GUI) agents have predominantly focused on training paradigms like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the challenge of high-dynamic GUI environments remains largely underexplored. Existing agents …
arXiv cs.CV TIER_1 English(EN) · Hongxin Li, Xiping Wang, Jingran Su, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang · 2026-04-28 04:00

AutoGUI-v2：一个全面的多模态 GUI 功能理解基准

arXiv:2604.24441v1 Announce Type: new Abstract: Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a …
arXiv cs.CV TIER_1 English(EN) · Zhaoxiang Zhang · 2026-04-27 13:06

AutoGUI-v2：一个全面的多模态 GUI 功能理解基准

Autonomous agents capable of navigating Graphical User Interfaces (GUIs) hold the potential to revolutionize digital productivity. However, achieving true digital autonomy extends beyond reactive element matching; it necessitates a predictive mental model of interface dynamics an…

报道来源 [17]

相关实体

相关话题