New framework StorySim tests LLMs for Theory of Mind capabilities

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a new framework called StorySim to evaluate the theory of mind (ToM) and world modeling (WM) capabilities of large language models. This system generates novel stories to test how well LLMs can understand character perspectives and mental states, aiming to avoid issues with pre-training data contamination. Experiments using StorySim revealed that current LLMs perform better on WM tasks than ToM tasks, and show a tendency to reason more accurately about people than inanimate objects, sometimes exhibiting heuristic behavior. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a novel method for evaluating LLM understanding of mental states, potentially guiding future research in AI alignment and reasoning.

RANK_REASON Academic paper introducing a new evaluation framework for LLM capabilities.

Read on arXiv cs.CL →

paper
safety

COVERAGE [1]

arXiv cs.CL TIER_1 · Nathaniel Getachew, Abulhair Saparov · 2026-04-28 04:00

Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting

arXiv:2506.19089v5 Announce Type: replace Abstract: We introduce StorySim, a programmable framework for synthetically generating stories to evaluate the theory of mind (ToM) and world modeling (WM) capabilities of large language models (LLMs). Unlike prior benchmarks that may suf…

COVERAGE [1]

Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting

RELATED ENTITIES

RELATED TOPICS