PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions
Researchers have introduced PhoneHarness, a new benchmark and execution framework designed to evaluate AI agents that interact with mobile devices. Unlike previous methods that focused solely on GUI controls, PhoneHarness supports a mixed-action approach, allowing agents to utilize graphical user interfaces, command-line interfaces, and external tools. This framework aims to assess agents on their ability to complete verifiable mobile workflows with observable side effects, rather than just predicting the next screen action. The associated benchmark, PhoneHarness Bench, demonstrated a 75.0% pass rate, significantly outperforming existing settings by 12.9 percentage points, highlighting the importance of action-surface routing and verifiable execution for reliable phone automation. AI
IMPACT This new framework enables more robust evaluation of AI agents for mobile automation, pushing the field towards agents that can handle complex, real-world workflows.