PulseAugur
实时 11:42:35

Claude Opus 4.7 在人工智能研究加速基准测试中领先前沿代理

一篇新研究论文提出了一个评估人工智能自主实现机器学习管道能力的基准,旨在检测递归自我改进的早期迹象。前沿编码代理的任务是在三小时内为Connect Four创建一个AlphaZero风格的管道。Claude Opus 4.7表现出卓越的性能,在大多数试验中优于外部求解器,而GPT-5.4则表现出异常的时间预算使用模式。 AI

影响 该基准可以为人工智能自我改进提供更早的预警,可能影响人工智能安全研究的方向。

排序理由 该集群包含一篇提出人工智能研究能力新基准的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Claude Opus 4.7 在人工智能研究加速基准测试中领先前沿代理

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Joshua Sherwood, Ben Aybar, Benjamin Kaplan ·

    Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver

    arXiv:2604.25067v1 Announce Type: cross Abstract: Forecasting when AI systems will become capable of meaningfully accelerating AI research is a central challenge for AI safety. Existing benchmarks measure broad capability growth, but may not provide ample early warning signals fo…

  2. arXiv cs.LG TIER_1 English(EN) · Benjamin Kaplan ·

    Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver

    Forecasting when AI systems will become capable of meaningfully accelerating AI research is a central challenge for AI safety. Existing benchmarks measure broad capability growth, but may not provide ample early warning signals for recursive self-improvement. We propose measuring…