New iOSWorld benchmark tests AI agents' personalization on mobile

By PulseAugur Editorial · [2 sources] · 2026-06-08 17:27

Researchers have introduced iOSWorld, a new benchmark designed to evaluate the personalization capabilities of AI agents on mobile devices. This benchmark features a simulated iOS environment with 26 interconnected apps that store user-specific data like messages and financial records. It includes 133 tasks, ranging from single-app operations to complex multi-app scenarios requiring memory and personalization inference. Initial evaluations show that even advanced models struggle with these tasks, with the best configuration achieving only 52% overall accuracy. AI

IMPACT This benchmark will drive the development of more personalized and context-aware AI agents for mobile devices.

RANK_REASON The cluster describes a new benchmark for AI agents, which falls under research.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.LG TIER_1 English(EN) · Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov · 2026-06-09 04:00

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

arXiv:2606.09764v1 Announce Type: new Abstract: A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile …
arXiv cs.CL TIER_1 English(EN) · Ruslan Salakhutdinov · 2026-06-08 17:27

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile agent benchmarks lack this kind of personalizati…

COVERAGE [2]

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

RELATED ENTITIES

RELATED TOPICS