iOSWorld: A Benchmark for Personally Intelligent Phone Agents
Researchers have introduced iOSWorld, a new benchmark designed to evaluate the personalization capabilities of AI agents on mobile devices. This benchmark features a simulated iOS environment with 26 interconnected apps that store user-specific data like messages and financial records. It includes 133 tasks, ranging from single-app operations to complex multi-app scenarios requiring memory and personalization inference. Initial evaluations show that even advanced models struggle with these tasks, with the best configuration achieving only 52% overall accuracy. AI
IMPACT This benchmark will drive the development of more personalized and context-aware AI agents for mobile devices.