English(EN) MaD Physics: Evaluating information seeking under constraints in physical environments

新基准 MaD Physics 在约束条件下测试 AI 的科学发现能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 16:37

研究人员推出了一款名为 MaD Physics 的新基准，旨在评估 AI 代理在现实世界约束条件下进行科学发现的能力。该基准侧重于代理在数据收集的数量和质量受限时如何进行测量和得出结论。该系统包含三个基于修改后的物理定律的环境，以防止先验知识污染，挑战代理在既定预算内推断基本原理并做出未来预测。使用各种 Gemini 模型进行的初步评估揭示了它们在结构化探索和数据收集能力方面的不足，表明科学推理方面有待改进。 AI

影响引入了一个新颖的基准，用于评估 AI 在现实约束下进行科学推理和数据收集的能力，可能指导未来模型的开发。

排序理由该集群包含一篇介绍用于评估 AI 能力的新基准的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Nenad Tomašev · 2026-05-11 16:37

MaD Physics: Evaluating information seeking under constraints in physical environments

Scientific discovery is fundamentally a resource-constrained process that requires navigating complex trade-offs between the quality and quantity of measurements due to physical and cost constraints. Measurements drive the scientific process by revealing novel phenomena to improv…

报道来源 [1]

MaD Physics: Evaluating information seeking under constraints in physical environments

相关实体

相关话题