PulseAugur
LIVE 07:04:43
research · [1 source] ·
0
research

LLMs struggle to reproduce physics experiment results, failing numerical simulations

A new preprint from Peking University evaluated the ability of large language models to reproduce numerical results from experimental physics papers. Researchers found that all tested LLMs, including OpenAI Codex powered by GPT-5.3, achieved a 0% end-to-end callback rate, meaning they could not replicate any full numerical outcomes. While the models demonstrated strong comprehension of the papers' methodologies, they consistently made errors in data analysis and numerical simulation, leading to incorrect final results. The study identified several failure modes, such as formula implementation errors and oversimplification of complex physical models. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT LLMs struggle with complex numerical simulation and data analysis in scientific research, indicating limitations beyond text comprehension.

RANK_REASON Academic paper evaluating LLM capabilities on a new domain (physics simulation).

Read on LessWrong (AI tag) →

COVERAGE [1]

  1. LessWrong (AI tag) TIER_1 · fessus ·

    AI Is Bad at Physics

    <p><span>There’s a </span><a href="https://arxiv.org/pdf/2603.27646"><span>new preprint</span></a><span> from Peking University in China that assesses LLM capabilities in reproducing results from experimental physics papers. Their finding? All the agents had a </span><b><span>0% …