Researchers have developed MDIA, a Multi-agent Diagnostic Intelligence Agent, which utilizes a 7-node clinical reasoning graph to achieve strong performance on the HealthBench Professional benchmark. When evaluated using OpenAI's GPT-5.4-2026-03-05, MDIA scored 0.6272, surpassing ChatGPT for Clinicians by 3.72 percentage points. The study indicates that architectural design, including specialty routing and context preservation, significantly impacts agentic performance, rather than solely prompt engineering. The choice of grading model also introduces variability, as demonstrated by MDIA scoring 0.6585 when graded by Gemini 2.5 Pro, highlighting the need for multi-grader evaluations. AI
影响 Demonstrates architectural improvements in AI agents can significantly boost performance on clinical benchmarks, suggesting a path beyond prompt engineering.
排序理由 Academic paper detailing a new AI system and its performance on a benchmark. [lever_c_demoted from research: ic=1 ai=1.0]
- ChatGPT for Clinicians
- Gemini 2.5 Pro
- GPT-5.4-2026-03-05
- HealthBench Professional
- OpenAI
- Roberto Cruz Perez
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →