Researchers have introduced MEDS (Math Education Digital Shadows), a new dataset designed to evaluate how large language models perform in mathematics and identify potential biases. MEDS comprises 28,000 personas across 14 LLMs, simulating human and AI assistant interactions. It goes beyond traditional benchmarks by incorporating measures of self-efficacy, math anxiety, and cognitive networks alongside proficiency scores. AI
影响 Provides a new dataset for evaluating LLM math capabilities and biases, aiding the development of safer AI tutors.
排序理由 The cluster describes a new dataset and research paper released on arXiv.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →