New QSTRBench benchmark tests LLM spatial and temporal reasoning

By PulseAugur Editorial · [1 sources] · 2026-05-18 13:26

Researchers have introduced QSTRBench, a new benchmark designed to assess the qualitative spatial and temporal reasoning capabilities of large language models. The benchmark includes a variety of calculi such as Point Algebra, Allen's Interval Algebra, and Region Connection Calculus, with some calculi, like RCC-22, being published for the first time. While current frontier models show performance exceeding random chance, none can consistently answer all questions correctly, with difficulty varying significantly across different calculi. AI

IMPACT Introduces a new evaluation framework to better understand and improve LLM capabilities in complex reasoning tasks.

RANK_REASON The cluster contains a new academic paper introducing a novel benchmark for evaluating LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Robert E. Blackwell · 2026-05-18 13:26

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

We introduce an extensive qualitative spatial and temporal reasoning (QSTR) benchmark for evaluating large language models (LLMs). We pose questions concerning compositional reasoning (using composition tables, CT), converse relations, and conceptual neighbourhoods (CN) for QSTR …

COVERAGE [1]

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

RELATED ENTITIES

RELATED TOPICS