English(EN) Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents

新型 VLM 从移动屏幕演示中提取操作知识

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-12 04:00

研究人员开发了一种名为 Teach VLM 的新方法，用于从移动屏幕演示中提取操作知识。该模型分析视频中的关键帧，以理解操作、UI 元素和执行顺序，将视觉状态转换转化为自然语言描述。为了克服数据稀缺性，创建了一个系统性的数据飞轮以实现可扩展的获取，并引入了一个中文移动屏幕教学基准进行评估。Teach-and-Repeat 范式利用这些操作知识来指导基于屏幕的执行代理，在 Android World 上任务成功率方面显示出显著的改进。 AI

影响这项研究可以通过提高 GUI 代理在移动设备上理解和复制用户操作的能力，从而实现更复杂的 GUI 代理。

排序理由该集群包含一篇研究论文，详细介绍了一种从移动屏幕演示中提取操作知识的新模型和方法。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yudong Zhang (Honor Device Co., Ltd), Lei Hu (Honor Device Co., Ltd), Daoyang Liu (The Chinese University of Hong Kong, Hong Kong, China), Jiawei Liu (Honor Device Co., Ltd), Yangfan Luo (Honor Device Co., Ltd), Xingyu Liu (Honor Device Co., Ltd), Zuojia… · 2026-06-12 04:00

Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents

arXiv:2606.12817v1 Announce Type: new Abstract: Understanding the digital world on mobile devices is shifting from static UI perception to dynamic action comprehension. This capability enables models to convert visual state transitions into operational knowledge, defined as short…

报道来源 [1]

Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents

相关话题