LLMs struggle with probabilistic reasoning, study finds

By PulseAugur Editorial · [2 sources] · 2026-06-05 17:59

A new study published on arXiv reveals that large language models struggle with probabilistic reasoning, particularly on counterintuitive problems. While models perform well on standard probability exercises, their accuracy drops significantly on trickier scenarios designed to elicit heuristic thinking. The research also highlights a 'token bias,' where performance degrades when problem formulations are disguised, and misleading prompts can reduce accuracy by up to 34%. These findings suggest that current LLMs are not yet robust probabilistic reasoners, despite their proficiency in other advanced mathematical tasks. AI

IMPACT Highlights limitations in LLM reasoning, suggesting caution in applications requiring precise probabilistic judgment.

RANK_REASON The cluster contains an academic paper detailing research findings on LLM capabilities.

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Luca Avena, Gianmarco Bet, Bernardo Busoni · 2026-06-08 04:00

How reliable are LLMs when it comes to playing dice?

arXiv:2606.07515v1 Announce Type: cross Abstract: We investigate the probabilistic reasoning capabilities of large language models through a controlled benchmarking study on discrete probability problems. We constructed two datasets, respectively a set of standard exercises and a…
arXiv cs.AI TIER_1 English(EN) · Bernardo Busoni · 2026-06-05 17:59

How reliable are LLMs when it comes to playing dice?

We investigate the probabilistic reasoning capabilities of large language models through a controlled benchmarking study on discrete probability problems. We constructed two datasets, respectively a set of standard exercises and a set of counterintuitive exercises, designed to tr…

COVERAGE [2]

How reliable are LLMs when it comes to playing dice?

How reliable are LLMs when it comes to playing dice?

RELATED ENTITIES

RELATED TOPICS