PulseAugur
实时 07:07:39

Medical LLMs show significant factual errors and policy violations

A new study published on arXiv assessed 6,233 web-deployed medical large language models (LLMs), evaluating a sample of 1,500 along with 10 open-source models. The research found that a significant portion of these models exhibit factual inaccuracies, with 25-30% showing low accuracy and over half violating operational thresholds. Additionally, many action-enabled models lacked adequate privacy disclosures, indicating systemic gaps in safety and compliance. AI

影响 Highlights critical safety and compliance issues in medical AI, necessitating stronger safeguards for patient care.

排序理由 The cluster contains an academic paper detailing a large-scale assessment of medical LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Medical LLMs show significant factual errors and policy violations

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Rahat Masood ·

    Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

    Medical large language models (LLMs), including custom medical GPTs (MedGPTs) and open-source models, are increasingly deployed on web platforms to provide clinical guidance. However, they pose risks of hallucination, policy noncompliance, and unsafe design. We conduct a large-sc…