English(EN) Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

新的基准测试在动态短视频平台上测试GUI代理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-04 04:00

研究人员推出了LivingScreen，这是一个旨在评估动态短视频平台上的GUI代理的新基准。与假设屏幕静态的现有代理不同，LivingScreen代理必须在内容持续播放的环境中运行，需要决定观察的时机和持续时间。对当前前沿模型的评估显示，在准确性和成本效益方面，没有一个能达到人类的性能，常见的失败包括观察过多或不足，这突显了未来GUI代理在改进观察控制方面存在需求。 AI

影响该基准测试突显了当前GUI代理处理动态环境的能力方面的一个关键差距，可能指导未来的研究朝着更具适应性和效率的AI系统发展。

排序理由该集群包含一篇介绍用于评估AI代理的新基准的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Jiashu Yao, Heyan Huang, Daiqing Wu, Wangke Chen, Huaxi Ai, Haoyu Wen, Zeming Liu, Yuhang Guo · 2026-06-04 04:00

Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

arXiv:2606.04701v1 Announce Type: cross Abstract: GUI agents today assume a static screen, where the world is frozen between two actions. However, real interfaces such as short-video applications violate this assumption, as their content keeps playing, and a competent user must d…

报道来源 [1]

Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

相关实体

相关话题