Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models
A new study published on arXiv assessed 6,233 web-deployed medical large language models (LLMs), evaluating a sample of 1,500 along with 10 open-source models. The research found that a significant portion of these models exhibit factual inaccuracies, with 25-30% showing low accuracy and over half violating operational thresholds. Additionally, many action-enabled models lacked adequate privacy disclosures, indicating systemic gaps in safety and compliance. AI
IMPACT Highlights critical safety and compliance issues in medical AI, necessitating stronger safeguards for patient care.