English(EN) Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Video2GUI 从无标签视频生成1200万条GUI轨迹

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-14 12:14

研究人员开发了Video2GUI，一个旨在为GUI代理训练生成大规模交互轨迹的自动化框架。该系统从无标签的互联网视频中提取数据，通过过滤过程将其转换为结构化的代理轨迹。由此产生的WildGUI数据集包含1500多个应用程序的1200万条轨迹，显著改进了Qwen2.5-VL和Mimo-VL等模型的预训练。 AI

影响能够为GUI代理创建大规模数据集，可能提高其在各种应用程序中的泛化能力和性能。

排序理由介绍GUI代理预训练新方法和数据集的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Hao Tian · 2026-05-14 12:14

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Recent advances in multimodal large language models have driven growing interest in graphical user interface (GUI) agents, yet their generalization remains constrained by the scarcity of large-scale training data spanning diverse real-world applications. Existing datasets rely he…

报道来源 [1]

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

相关实体

相关话题