Brief · PulseAugur

TOOL · dev.to — LLM tag English(EN) · 5d

Which LLM is the best stock picker? I built a benchmark to find out.

A new benchmark, dubbed 1rok, has been launched to evaluate the stock-picking capabilities of frontier large language models. The benchmark assigns each participating LLM a virtual portfolio of $100,000 and tasks them with selecting stocks weekly, with performance tracked against market outcomes. This initiative aims to provide a more practical, downstream evaluation of LLMs beyond traditional coding and reasoning benchmarks, focusing on decision-making under uncertainty. AI

IMPACT Provides a novel benchmark for evaluating LLM decision-making under uncertainty, moving beyond traditional coding and reasoning tasks.

OpenAI
Google
xAI
GPT-5.5
Gemini 3.1 Pro Preview
Kimi K2.6
GLM-5.1
DeepSeek V4 Pro
Moonshot
Grok 4.3
MiniMax M2.7
1rok