New training methods boost VLM mobile agents' interactive and safety capabilities

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 04:00

Researchers have developed two new approaches for enhancing the capabilities of vision-language model (VLM)-based mobile agents. Mobile-R1 introduces a hierarchical curriculum to improve exploration and self-correction, addressing challenges with sparse rewards in GUI interactions. InquireMobile focuses on safety by teaching agents to request human assistance at critical decision points, introducing a new benchmark called InquireBench to evaluate this capability. AI

影响 New training methodologies and benchmarks aim to improve the reliability and safety of VLM-based mobile agents in complex GUI environments.

排序理由 The cluster contains two arXiv papers introducing new methods and benchmarks for VLM-based mobile agents.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Jihao Gu, Qihang Ai, Yingyao Wang, Pi Bu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Ziming Wang, Yingxiu Zhao, Ming-Liang Zhang, Jun Song, Yuning Jiang, Bo Zheng · 2026-04-28 04:00

Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

arXiv:2506.20332v4 Announce Type: replace Abstract: Vision-language model-based mobile agents have gained the ability to understand complex instructions and mobile screenshots, benefiting from reinforcement learning paradigms like Group Relative Policy Optimization (GRPO). Howeve…
arXiv cs.AI TIER_1 English(EN) · Qihang Ai, Pi Bu, Yue Cao, Yingyao Wang, Jihao Gu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Zhicheng Zheng, Jun Song, Yuning Jiang · 2026-04-28 04:00

InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning

arXiv:2508.19679v2 Announce Type: replace Abstract: Recent advances in Vision-Language Models (VLMs) have enabled mobile agents to perceive and interact with real-world mobile environments based on human instructions. However, the current fully autonomous paradigm poses potential…

报道来源 [2]

Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning

相关实体

相关话题