English(EN) iOSWorld: A Benchmark for Personally Intelligent Phone Agents

新的iOSWorld基准测试了AI代理在移动设备上的个性化能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-08 17:27

研究人员推出了iOSWorld，这是一个旨在评估AI代理在移动设备上个性化能力的新基准测试。该基准测试包含一个模拟的iOS环境，其中有26个相互关联的应用程序，可以存储用户特定的数据，如消息和财务记录。它包括133个任务，从单应用程序操作到需要记忆和个性化推理的复杂多应用程序场景。初步评估显示，即使是先进的模型也难以完成这些任务，最佳配置的整体准确率仅为52%。 AI

影响该基准测试将推动开发更具个性化和上下文感知能力的移动设备AI代理。

排序理由该集群描述了一个新的AI代理基准测试，属于研究范畴。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov · 2026-06-09 04:00

iOSWorld：个人智能手机代理的基准

arXiv:2606.09764v1 Announce Type: new Abstract: A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile …
arXiv cs.CL TIER_1 English(EN) · Ruslan Salakhutdinov · 2026-06-08 17:27

iOSWorld：个人智能手机代理的基准

A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile agent benchmarks lack this kind of personalizati…

报道来源 [2]

iOSWorld：个人智能手机代理的基准

iOSWorld：个人智能手机代理的基准

相关实体

相关话题