A new solver for the ARC-AGI-2 visual reasoning benchmark has achieved a top score of 76.1% on the public evaluation set. This solver treats reasoning modalities like text, image, and code as search operators, generating diverse candidate solutions. It then uses a holistic judging approach, comparing all candidate traces simultaneously within a single prompt, which helps recover correct minority hypotheses. The solver outperformed leading models like GPT-5.2 Pro and Gemini 3 Pro by a significant margin. AI
IMPACT Sets a new state-of-the-art on the ARC-AGI-2 benchmark, demonstrating improved reasoning capabilities over existing frontier models.
RANK_REASON The cluster is about a new research paper detailing a novel solver for a specific AI benchmark. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →