PulseAugur
实时 11:48:12

MDIA agent achieves high scores on HealthBench Professional benchmark

Researchers have developed MDIA, a Multi-agent Diagnostic Intelligence Agent, which utilizes a 7-node clinical reasoning graph to achieve strong performance on the HealthBench Professional benchmark. When evaluated using OpenAI's GPT-5.4-2026-03-05, MDIA scored 0.6272, surpassing ChatGPT for Clinicians by 3.72 percentage points. The study indicates that architectural design, including specialty routing and context preservation, significantly impacts agentic performance, rather than solely prompt engineering. The choice of grading model also introduces variability, as demonstrated by MDIA scoring 0.6585 when graded by Gemini 2.5 Pro, highlighting the need for multi-grader evaluations. AI

影响 Demonstrates architectural improvements in AI agents can significantly boost performance on clinical benchmarks, suggesting a path beyond prompt engineering.

排序理由 Academic paper detailing a new AI system and its performance on a benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

报道来源 [1]

  1. arXiv cs.AI TIER_1 English(EN) · Roberto Cruz, David Rey-Blanco ·

    MDIA: A Multi-Agent Diagnostic Intelligence Pipeline on HealthBench Professional

    arXiv:2605.24699v1 Announce Type: new Abstract: Most reported gains on agentic-LLM clinical benchmarks are often attributed to prompt engineering, yet our results suggest that larger improvements can come from architectural and engine-level design. We present MDIA, a Multi-agent …