PulseAugur
实时 23:56:14

New MEDS dataset maps LLM math reasoning, bias, and attitudes

Researchers have introduced MEDS (Math Education Digital Shadows), a new dataset designed to evaluate how large language models perform in mathematics and identify potential biases. MEDS comprises 28,000 personas across 14 LLMs, simulating human and AI assistant interactions. It goes beyond traditional benchmarks by incorporating measures of self-efficacy, math anxiety, and cognitive networks alongside proficiency scores. AI

影响 Provides a new dataset for evaluating LLM math capabilities and biases, aiding the development of safer AI tutors.

排序理由 The cluster describes a new dataset and research paper released on arXiv.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

New MEDS dataset maps LLM math reasoning, bias, and attitudes

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Naomi Esposito, Anthony Tricarico, Luisa Porzio, Ali Aghazadeh Ardebili, Massimo Stella ·

    Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

    arXiv:2604.27618v1 Announce Type: new Abstract: To enhance LLMs' impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mapping how large language models rea…

  2. arXiv cs.LG TIER_1 English(EN) · Massimo Stella ·

    Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

    To enhance LLMs' impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mapping how large language models reason about and report mathematics across human- a…