New MRI-Eval benchmark reveals LLMs struggle with GE scanner operations

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-06 17:42

Researchers have developed MRI-Eval, a new benchmark designed to assess large language models' understanding of MRI physics and GE scanner operations. The benchmark, comprising 1365 questions across three difficulty tiers, revealed that while models perform exceptionally well on standard multiple-choice questions, their accuracy significantly drops when tested on free-text recall, particularly for vendor-specific operational knowledge. This suggests that high scores on conventional tests may mask limitations in practical application, urging caution when using LLM outputs for critical guidance. AI

影响 Highlights potential limitations of LLMs in specialized technical domains, suggesting caution for their application in critical operational guidance.

排序理由 The cluster contains a new academic paper introducing a novel benchmark for evaluating LLMs.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CL TIER_1 English(EN) · Perry E. Radau · 2026-05-07 04:00

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

arXiv:2605.05175v1 Announce Type: cross Abstract: Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanne…
arXiv cs.CL TIER_1 English(EN) · Perry E. Radau · 2026-05-06 17:42

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI pr…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-06 17:42

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI pr…

报道来源 [3]

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

相关实体

相关话题