English(EN) BRIDGE: Predicting Human Task Completion Time From Model Performance

新的BRIDGE框架根据模型性能预测AI任务完成时间

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员开发了一个名为BRIDGE 的新框架，该框架使用项目反应理论根据 AI 模型性能来预测人类任务的完成时间。该方法根据各种基准的性能数据估算潜在的任务难度和模型能力。该框架表明，潜在的任务难度与人类完成时间的对数呈线性相关，从而仅凭模型性能即可推断新基准的完成时间。这种方法可以预测未来的模型能力，并重现现有的指数级扩展结果，表明可解决任务的范围大约每六个月翻一番。 AI

影响该框架可以通过仅根据性能数据预测人类任务的完成时间，从而实现更高效、可扩展的 AI 模型评估。

排序理由该集群包含一篇详细介绍评估 AI 功能的新框架和方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Fengyuan Liu, Jay Gala, Nilaksh, Dzmitry Bahdanau, Siva Reddy, Hugo Larochelle · 2026-07-03 04:00

BRIDGE: Predicting Human Task Completion Time From Model Performance

arXiv:2602.07267v2 Announce Type: replace Abstract: Evaluating the real-world capabilities of AI systems requires grounding benchmark performance in human-interpretable measures of task difficulty. Existing approaches that rely on direct human task completion time annotations are…

报道来源 [1]

BRIDGE: Predicting Human Task Completion Time From Model Performance

相关实体

相关话题