New training methods boost VLM mobile agents' interactive and safety capabilities

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have developed two new approaches for enhancing the capabilities of vision-language model (VLM)-based mobile agents. Mobile-R1 introduces a hierarchical curriculum to improve exploration and self-correction, addressing challenges with sparse rewards in GUI interactions. InquireMobile focuses on safety by teaching agents to request human assistance at critical decision points, introducing a new benchmark called InquireBench to evaluate this capability. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT New training methodologies and benchmarks aim to improve the reliability and safety of VLM-based mobile agents in complex GUI environments.

RANK_REASON The cluster contains two arXiv papers introducing new methods and benchmarks for VLM-based mobile agents.

Read on arXiv cs.AI →

paper
safety

COVERAGE [2]

arXiv cs.AI TIER_1 · Jihao Gu, Qihang Ai, Yingyao Wang, Pi Bu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Ziming Wang, Yingxiu Zhao, Ming-Liang Zhang, Jun Song, Yuning Jiang, Bo Zheng · 2026-04-28 04:00

Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

arXiv:2506.20332v4 Announce Type: replace Abstract: Vision-language model-based mobile agents have gained the ability to understand complex instructions and mobile screenshots, benefiting from reinforcement learning paradigms like Group Relative Policy Optimization (GRPO). Howeve…
arXiv cs.AI TIER_1 · Qihang Ai, Pi Bu, Yue Cao, Yingyao Wang, Jihao Gu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Zhicheng Zheng, Jun Song, Yuning Jiang · 2026-04-28 04:00

InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning

arXiv:2508.19679v2 Announce Type: replace Abstract: Recent advances in Vision-Language Models (VLMs) have enabled mobile agents to perceive and interact with real-world mobile environments based on human instructions. However, the current fully autonomous paradigm poses potential…

COVERAGE [2]

Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning

RELATED ENTITIES

RELATED TOPICS