English(EN) I made Claude, GPT and Gemini predict the entire 2026 World Cup. Here's the experiment design.

LLM Claude、GPT-5.2、Gemini 预测 2026 年世界杯

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-11 17:53

进行了一项实验，以基准测试三个领先的 LLM——Claude Opus 4.8、GPT-5.2 和 Gemini 3.1 Pro——预测 2026 年世界杯的能力。在三种条件下对模型进行了测试：仅使用其内部知识、访问网络浏览以及使用 FIFA 排名和 Elo 评分的标准化数据集。这种严谨的设计旨在确定性能差异是源于模型的内在知识还是其数据检索和处理能力。实验显示，模型预测在提供的信息基础上存在不一致性，其中 GPT-5.2 表现出一些奇怪的行为，例如虚构足球规则，而 Claude 则误解了模式文档。 AI

影响该实验突显了 LLM 在一致性和遵守规则方面的局限性，表明在处理复杂的预测任务时，需要改进提示工程和数据处理。

排序理由该集群描述了一项比较 LLM 在特定任务上的性能的实验，包括方法论和观察到的行为，这与研究报告一致。[lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · Willian Pinho · 2026-06-11 17:53

I made Claude, GPT and Gemini predict the entire 2026 World Cup. Here's the experiment design.

<p>The 2026 World Cup kicks off today: 48 teams, 104 matches, five weeks. I'm using it as a benchmark.</p> <p>Three frontier models (Claude Opus 4.8, GPT-5.2 and Gemini 3.1 Pro) predicted every group match with scorelines and win/draw/loss probabilities, then a complete knockout …

报道来源 [1]

I made Claude, GPT and Gemini predict the entire 2026 World Cup. Here's the experiment design.

相关实体

相关话题