English(EN) MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

新的MRI-Eval基准显示LLM在GE扫描仪操作方面存在困难

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-06 17:42

研究人员开发了MRI-Eval，这是一个旨在评估大型语言模型对MRI物理和GE扫描仪操作理解能力的新基准。该基准包含三个难度级别的1365个问题，结果显示，尽管模型在标准的单项选择题上表现出色，但在自由文本回忆测试中，尤其是在供应商特定的操作知识方面，其准确性显著下降。这表明在传统测试中的高分可能掩盖了实际应用中的局限性，在使用LLM输出进行关键指导时需要谨慎。 AI

影响强调了LLM在专业技术领域的潜在局限性，建议在使用关键操作指导时需谨慎。

排序理由该集群包含一篇介绍LLM评估新基准的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CL TIER_1 English(EN) · Perry E. Radau · 2026-05-07 04:00

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

arXiv:2605.05175v1 Announce Type: cross Abstract: Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanne…
arXiv cs.CL TIER_1 English(EN) · Perry E. Radau · 2026-05-06 17:42

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI pr…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-06 17:42

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI pr…

报道来源 [3]

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

相关实体

相关话题