PulseAugur
EN
LIVE 10:27:41

New iOSWorld benchmark tests AI agents' personalization on mobile

Researchers have introduced iOSWorld, a new benchmark designed to evaluate the personalization capabilities of AI agents on mobile devices. This benchmark features a simulated iOS environment with 26 interconnected apps that store user-specific data like messages and financial records. It includes 133 tasks, ranging from single-app operations to complex multi-app scenarios requiring memory and personalization inference. Initial evaluations show that even advanced models struggle with these tasks, with the best configuration achieving only 52% overall accuracy. AI

IMPACT This benchmark will drive the development of more personalized and context-aware AI agents for mobile devices.

RANK_REASON The cluster describes a new benchmark for AI agents, which falls under research.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov ·

    iOSWorld: A Benchmark for Personally Intelligent Phone Agents

    arXiv:2606.09764v1 Announce Type: new Abstract: A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile …

  2. arXiv cs.CL TIER_1 English(EN) · Ruslan Salakhutdinov ·

    iOSWorld: A Benchmark for Personally Intelligent Phone Agents

    A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile agent benchmarks lack this kind of personalizati…