PulseAugur / Brief
EN
LIVE 11:17:27

Brief

last 24h
[1/1] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence

    Researchers have introduced MedVIGIL, a new evaluation suite designed to test the trustworthiness of medical vision-language models (VLMs). The suite focuses on how well these models recognize when visual evidence is compromised or misleading, a critical factor for clinical use. MedVIGIL includes 300 cases, meticulously curated and annotated by board-certified radiologists, to assess model performance under various forms of broken visual evidence. The benchmark revealed a significant gap between human performance and current models, with the strongest audited model, Claude Opus 4.7, scoring considerably lower than the independent radiologist baseline. AI

    IMPACT Establishes a new benchmark for evaluating the trustworthiness of medical AI, highlighting current model limitations in recognizing compromised visual evidence.