Researchers have introduced MEDS (Math Education Digital Shadows), a new dataset designed to evaluate how large language models perform in mathematics and identify potential biases. MEDS comprises 28,000 personas across 14 LLMs, simulating human and AI assistant interactions. It goes beyond traditional benchmarks by incorporating measures of self-efficacy, math anxiety, and cognitive networks alongside proficiency scores. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Provides a new dataset for evaluating LLM math capabilities and biases, aiding the development of safer AI tutors.
RANK_REASON The cluster describes a new dataset and research paper released on arXiv.