A new study evaluated 15 advanced AI chatbots on their ability to perform emergency psychiatric triage using 112 clinical vignettes. The chatbots demonstrated high accuracy in identifying true emergencies, with an under-triage rate of only 5.6% for critical cases. However, they exhibited a tendency to over-triage less urgent situations, resulting in a mean signed ordinal error indicating a net over-estimation of urgency. Overall accuracy across all triage levels varied significantly between models, ranging from 42.0% to 71.8%. AI
影响 Highlights the need for improved calibration in AI models used for sensitive applications like mental health triage.
排序理由 Academic paper evaluating AI chatbot performance on a specific task.
- AGIEval
- arXiv
- Claude 3 Opus
- NeurIPS
- Gemini 1.5 Pro
- GPT-4
- Llama 3
- MMLU
- Mistral Large
- Veith Weilnhammer
- AI chatbots
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →