PulseAugur
EN
LIVE 10:47:11

New solver beats GPT-5.2 Pro and Gemini 3 Pro on ARC-AGI-2 benchmark

A new solver for the ARC-AGI-2 visual reasoning benchmark has achieved a top score of 76.1% on the public evaluation set. This solver treats reasoning modalities like text, image, and code as search operators, generating diverse candidate solutions. It then uses a holistic judging approach, comparing all candidate traces simultaneously within a single prompt, which helps recover correct minority hypotheses. The solver outperformed leading models like GPT-5.2 Pro and Gemini 3 Pro by a significant margin. AI

IMPACT Sets a new state-of-the-art on the ARC-AGI-2 benchmark, demonstrating improved reasoning capabilities over existing frontier models.

RANK_REASON The cluster is about a new research paper detailing a novel solver for a specific AI benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New solver beats GPT-5.2 Pro and Gemini 3 Pro on ARC-AGI-2 benchmark

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Johan Land ·

    Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2

    arXiv:2606.31543v1 Announce Type: new Abstract: Large language models can produce fluent, internally coherent reasoning traces for abstract reasoning tasks while still being confidently wrong - making selection among candidates, not just generation, the central challenge. I prese…

  2. arXiv cs.AI TIER_1 English(EN) · Johan Land ·

    Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2

    Large language models can produce fluent, internally coherent reasoning traces for abstract reasoning tasks while still being confidently wrong - making selection among candidates, not just generation, the central challenge. I present a solver for ARC-AGI-2, a few-shot visual rea…