Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 16h · [2 sources]

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

Researchers have introduced iOSWorld, a new benchmark designed to evaluate the personalization capabilities of AI agents on mobile devices. This benchmark features a simulated iOS environment with 26 interconnected apps that store user-specific data like messages and financial records. It includes 133 tasks, ranging from single-app operations to complex multi-app scenarios requiring memory and personalization inference. Initial evaluations show that even advanced models struggle with these tasks, with the best configuration achieving only 52% overall accuracy. AI

IMPACT This benchmark will drive the development of more personalized and context-aware AI agents for mobile devices.

arXiv
Machine Learning
Computer Science
iOSWorld
AI agents
iOS