Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 6d

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

A new study published on arXiv assessed 6,233 web-deployed medical large language models (LLMs), evaluating a sample of 1,500 along with 10 open-source models. The research found that a significant portion of these models exhibit factual inaccuracies, with 25-30% showing low accuracy and over half violating operational thresholds. Additionally, many action-enabled models lacked adequate privacy disclosures, indicating systemic gaps in safety and compliance. AI

IMPACT Highlights critical safety and compliance issues in medical AI, necessitating stronger safeguards for patient care.

LLMs
arXiv
Sunday Ogundoyin
MedGPTs
HAA-MedGPT