PulseAugur
实时 06:49:44
English(EN) How Mobile World Model Guides GUI Agents?

移动世界模型通过多模态预测增强GUI代理

研究人员开发了一种新颖的方法,使用“移动世界模型”来增强GUI代理的功能。该模型探索了四种模态——增量文本、完整文本、基于扩散的图像和可渲染代码——来预测移动界面中的动作后果。研究结果表明,虽然可渲染代码在分布内任务中提供高保真度,但基于文本的反馈对于在线执行更具鲁棒性。这些世界模型生成的轨迹可以通过提供可转移的交互经验来提高代理性能,尽管它们可能无法完美保留原始数据分布。研究还表明,对于容易过度自信的代理,世界模型作为先验感知或训练监督比事后验证器更有效。 AI

影响 通过多模态世界建模和可转移的交互经验,增强了GUI代理的可靠性和任务性能。

排序理由 该集群包含一篇学术论文,详细介绍了使用移动世界模型指导GUI代理的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Guohong Liu, Jialei Ye, Pengzhi Gao, Wei Liu, Jian Luan, Yunxin Liu, Yuanchun Li ·

    SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

    arXiv:2605.25160v1 Announce Type: new Abstract: Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluation. Existing benchmarks prioritize reproducibility but are often limited to open-source apps o…

  2. arXiv cs.AI TIER_1 English(EN) · Dingbang Wu, Rui Hao, Haiyang Wang, Shuzhe Wu, Han Xiao, Zhenghong Li, Bojiang Zhou, Zheng Ju, Zichen Liu, Lue Fan, Zhaoxiang Zhang ·

    MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

    arXiv:2605.26114v1 Announce Type: new Abstract: We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two capabilities previously out of reac…

  3. arXiv cs.AI TIER_1 English(EN) · Zhaoxiang Zhang ·

    MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

    We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two capabilities previously out of reach for everyday apps: verifiable outcome signals …

  4. arXiv cs.AI TIER_1 English(EN) · Weikai Xu, Kun Huang, Yunren Feng, Jiaxing Li, Yuhan Chen, Yuxuan Liu, Zhizheng Jiang, Heng Qu, Pengzhi Gao, Wei Liu, Jian Luan, Xiaolin Hu, Bo An ·

    移动世界模型如何指导GUI代理?

    arXiv:2605.10347v2 Announce Type: replace Abstract: Recent advances in vision-language models have enabled mobile GUI agents to perceive visual interfaces and execute user instructions, but reliable prediction of action consequences remains critical for long-horizon and high-risk…