PulseAugur
实时 10:39:27
English(EN) Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2

新求解器在ARC-AGI-2基准测试中超越GPT-5.2 Pro和Gemini 3 Pro

一个用于ARC-AGI-2视觉推理基准测试的新求解器,在公开评估集上取得了76.1%的最高分。该求解器将文本、图像和代码等推理模态视为搜索算子,生成多样化的候选解决方案。然后,它采用整体判断方法,在单个提示中同时比较所有候选追踪,这有助于恢复正确的少数派假设。该求解器显著优于GPT-5.2 Pro和Gemini 3 Pro等领先模型。 AI

影响 在ARC-AGI-2基准测试上创下新的最先进水平,展示了超越现有前沿模型的推理能力提升。

排序理由 该集群是关于一篇详细介绍特定AI基准测试新求解器的新研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新求解器在ARC-AGI-2基准测试中超越GPT-5.2 Pro和Gemini 3 Pro

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Johan Land ·

    Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2

    arXiv:2606.31543v1 Announce Type: new Abstract: Large language models can produce fluent, internally coherent reasoning traces for abstract reasoning tasks while still being confidently wrong - making selection among candidates, not just generation, the central challenge. I prese…

  2. arXiv cs.AI TIER_1 English(EN) · Johan Land ·

    Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2

    Large language models can produce fluent, internally coherent reasoning traces for abstract reasoning tasks while still being confidently wrong - making selection among candidates, not just generation, the central challenge. I present a solver for ARC-AGI-2, a few-shot visual rea…