LLM benchmark 1rok pits GPT-5.5, Gemini 3.1, Grok 4.3 in stock-picking contest

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-20 19:01

A new benchmark, dubbed 1rok, has been launched to evaluate the stock-picking capabilities of frontier large language models. The benchmark assigns each participating LLM a virtual portfolio of $100,000 and tasks them with selecting stocks weekly, with performance tracked against market outcomes. This initiative aims to provide a more practical, downstream evaluation of LLMs beyond traditional coding and reasoning benchmarks, focusing on decision-making under uncertainty. AI

影响 Provides a novel benchmark for evaluating LLM decision-making under uncertainty, moving beyond traditional coding and reasoning tasks.

排序理由 The article describes a new benchmark for evaluating LLMs on a specific downstream task (stock picking), which is a form of research and evaluation. [lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

LLM benchmark 1rok pits GPT-5.5, Gemini 3.1, Grok 4.3 in stock-picking contest

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · Achal Jhawar · 2026-05-20 19:01

Which LLM is the best stock picker? I built a benchmark to find out.

<p>Every other week there's a new GPT-vs-Claude-vs-Gemini benchmark on coding or math or reasoning. None of them tell you whether the model can actually make a decision under uncertainty, where the answer isn't in the training data and the result shows up two weeks later in a P&a…

报道来源 [1]

Which LLM is the best stock picker? I built a benchmark to find out.

相关实体

相关话题