New frameworks and benchmarks advance mobile GUI agent capabilities

By PulseAugur Editorial · [12 sources] · 2026-05-24 00:00

Researchers have developed several new frameworks and benchmarks to advance the capabilities of mobile GUI agents. STAMP introduces explicit memory training for agents in virtual environments, improving task resilience. PhoneWorld provides a scalable pipeline for converting real mobile trajectories into controllable environments for training and evaluation. MIRAGE highlights a vulnerability in VLM-driven agents, demonstrating how prompt injection can be achieved through user-generated content. MobileExplorer focuses on accelerating on-device inference for these agents by exploring UI elements in parallel and using contextual hints. MobileGym offers a verifiable and highly parallel simulation platform for mobile GUI agent research, enabling deterministic evaluation and scalable reinforcement learning. SimuWoB presents a fully synthetic benchmark for mobile GUI agents, revealing significant weaknesses in current agents on complex, long-horizon tasks. AI

IMPACT These advancements in mobile GUI agents and their evaluation frameworks could accelerate the development and deployment of more capable and secure AI assistants on mobile devices.

RANK_REASON Multiple research papers introducing new frameworks, benchmarks, and techniques for mobile GUI agents.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 12 sources. How we write summaries →

New frameworks and benchmarks advance mobile GUI agent capabilities

COVERAGE [12]

arXiv cs.CL TIER_1 English(EN) · Junyang Wang, Haiyang Xu, Xi Zhang, Zhaoqing Zhu, Ming Yan, Jieping Ye, Jitao Sang · 2026-05-29 04:00

STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments

arXiv:2605.29324v1 Announce Type: new Abstract: Mobile GUI agents excel at immediate reactive control but frequently fail in realistic, long-horizon tasks that require memory. This failure stems from a fundamental conflict between limited context windows and token-heavy screensho…
arXiv cs.AI TIER_1 English(EN) · Zhengyang Tang, Yuxuan Liu, Xin Lai, Junyi Li, Pengyuan Lyu, Jason, Yiduo Guo, Zhengyao Fang, Yang Ding, Yi Zhang, Weinong Wang, Huawen Shen, Xingran Zhou, Liang Wu, Fei Tang, Sunqi Fan, Shangpin Peng, Zheng Ruan, Anran Zhang, Benyou Wang, Rui Yan, Ji-R… · 2026-05-29 04:00

PhoneWorld: Scaling Phone-Use Agent Environments

arXiv:2605.29486v1 Announce Type: cross Abstract: A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale. Existing mobile-agent benchmarks have made important progress on evaluation, but t…
arXiv cs.AI TIER_1 English(EN) · Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong, Yuekang Li, Ying Zhang, Leo Yu Zhang, Lida Zhao, Ji Jie, Yuxiao Lu · 2026-05-28 04:00

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

arXiv:2605.28116v1 Announce Type: cross Abstract: Mobile graphical user interface (GUI) agents driven by vision-language models (VLMs) perceive the screen as rendered pixels and choose actions from what they see, so they cannot reliably separate trusted interface elements from us…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 00:00

PhoneWorld: Scaling Phone-Use Agent Environments

PhoneWorld is a pipeline that transforms real GUI trajectories and screenshots into controllable mobile environments, executable tasks, and automated verifiers, enabling scalable creation of phone-use benchmarks.
arXiv cs.AI TIER_1 English(EN) · Runxi Huang, Liyu Zhang, Shengzhong Liu, Xiaomin Ouyang · 2026-05-27 04:00

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

arXiv:2605.26546v1 Announce Type: new Abstract: Mobile graphical user interface (GUI) agents enable AI models to autonomously operate smartphones on behalf of users. However, most existing systems focus primarily on optimizing task accuracy and rely on cloud-hosted models for inf…
arXiv cs.AI TIER_1 English(EN) · Dingbang Wu, Rui Hao, Haiyang Wang, Shuzhe Wu, Han Xiao, Zhenghong Li, Bojiang Zhou, Zheng Ju, Zichen Liu, Lue Fan, Zhaoxiang Zhang · 2026-05-26 04:00

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

arXiv:2605.26114v1 Announce Type: new Abstract: We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two capabilities previously out of reac…
arXiv cs.AI TIER_1 English(EN) · Guohong Liu, Jialei Ye, Pengzhi Gao, Wei Liu, Jian Luan, Yunxin Liu, Yuanchun Li · 2026-05-26 04:00

SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

arXiv:2605.25160v1 Announce Type: new Abstract: Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluation. Existing benchmarks prioritize reproducibility but are often limited to open-source apps o…
arXiv cs.AI TIER_1 English(EN) · Zhaoxiang Zhang · 2026-05-25 17:59

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two capabilities previously out of reach for everyday apps: verifiable outcome signals …
arXiv cs.AI TIER_1 English(EN) · Weikai Xu, Kun Huang, Yunren Feng, Jiaxing Li, Yuhan Chen, Yuxuan Liu, Zhizheng Jiang, Heng Qu, Pengzhi Gao, Wei Liu, Jian Luan, Xiaolin Hu, Bo An · 2026-05-25 04:00

How Mobile World Model Guides GUI Agents?

arXiv:2605.10347v2 Announce Type: replace Abstract: Recent advances in vision-language models have enabled mobile GUI agents to perceive visual interfaces and execute user instructions, but reliable prediction of action consequences remains critical for long-horizon and high-risk…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 00:00

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

MobileGym presents a browser-based mobile environment enabling deterministic evaluation and scalable reinforcement learning through JSON-based state management and parallel execution.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-24 00:00

SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

A synthetic benchmark for mobile GUI agents with 120 challenging tasks is introduced, featuring high-fidelity virtual environments with automatic reward generation and revealing significant limitations in current agent performance on complex, long-horizon interactions.
arXiv cs.CV TIER_1 English(EN) · Yifan Sui, Xin Huang, Hongbing Li, Fang Xu, Jiahe Lv, Haolong Yan, Yeqing Shen, Litao Liu, Zhimin Fan, Ziyang Meng, Jia Wang, Junbo Qi, Kaijun Tan, Zheng Ge, Xiangyu Zhang, Daxin Jiang, Osamu Yoshie · 2026-05-28 04:00

AndroidDaily: A Verifiable Benchmark for Mobile GUI Agents on Real-World Closed-Source Applications

arXiv:2605.27761v1 Announce Type: new Abstract: The rapid development of GUI foundation models and mobile GUI agents has spurred numerous evaluation benchmarks, yet most rely on simulated environments or open-source applications, leaving real-world closed-source applications larg…

COVERAGE [12]

RELATED ENTITIES

RELATED TOPICS