Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 4d

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Researchers have introduced MedFM-Robust, a new benchmark designed to evaluate the reliability of medical foundation models. This benchmark assesses both vision-language models, such as LLaVA-Med and GPT-4o, and segmentation models like MedSAM. The goal is to ensure these advanced AI tools perform dependably in real-world clinical settings. AI

IMPACT Establishes a standard for evaluating the reliability of AI in clinical diagnostics and treatment planning.

GPT-4o
Gemini
MedGemma
MedSAM
SAM-Med2D
Yifang Wang
MedFM-Robust
LLaVA-Med