Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

Executable World Models for ARC-AGI-3 in the Era of Coding Agents

A new research paper introduces an executable world model approach for AI agents tackling the ARC-AGI-3 benchmark. This system uses Python to maintain and verify a world model, refactoring it for simplicity and planning actions before execution. When tested with GPT-5.5, the agent solved 15 out of 25 games, achieving a 58.12% RHAE, while GPT-5.4 solved 8 games with a 41.29% RHAE. AI

IMPACT Demonstrates a promising approach for AI agents to solve complex reasoning and planning tasks, potentially improving performance on similar benchmarks.

GPT-5.5
GPT-5.4
ARC-AGI-3
Sergey Rodionov