新的基准和数据合成提升GUI代理的错误恢复能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-28 00:00

研究人员开发了一个新的基准和数据合成框架，以提高GUI代理的错误恢复能力。该基准GUI-RobustEval包含1200多个测试用例，用于系统地衡量代理从自身错误中恢复的程度。此外，一个名为RoTS的框架生成了80万个数据点，用于训练代理处理各种错误模式及其相应的恢复步骤。使用这些数据微调的模型，如RoTS-32B，已显示出显著的性能提升，并在OSWorld等基准测试中取得了最先进的成果。 AI

影响通过提高AI代理从自身引起错误中恢复的能力，增强了其可靠性，可能加速其在现实世界中的部署。

排序理由该集群包含一篇研究论文，详细介绍了用于AI代理的新基准和数据合成框架。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Tianpeng Bu, Xin Liu, Qihua Chen, Hao Jiang, Shurui Li, Hongtao Duan, Lu Jiang, Lulu Hu, Bin Yang, Minying Zhang · 2026-05-29 04:00

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

arXiv:2605.29447v1 Announce Type: cross Abstract: While GUI agents have advanced rapidly, they often lack the robustness to recover from their own errors, hindering real-world deployment. To bridge this gap at both the evaluation and data levels, we introduce GUI-RobustEval and p…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 00:00

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

GUI agents lack robust error recovery capabilities, which this work addresses through GUI-RobustEval and Robustness-driven Trajectory Synthesis, demonstrating improved performance on real-world benchmarks.

报道来源 [2]

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

相关实体

相关话题