A new study published on arXiv evaluates the performance of 11 large language models (LLMs) in estimating PTSD severity from clinical narratives. The research found that LLMs perform best when provided with detailed contextual information, such as subscale definitions and interview questions, and that increased reasoning effort improves accuracy. Open-weight models like Llama and DeepSeek showed performance plateaus beyond 70B parameters, while closed-weight models like gpt-o3-mini and GPT-5 continued to improve with newer generations. The study also demonstrated that LLMs could differentiate PTSD severity from other conditions and predict future healthcare expenditure. AI
IMPACT LLMs demonstrate potential for clinical utility in mental health assessment, particularly with enhanced contextual knowledge and reasoning strategies.
RANK_REASON Research paper published on arXiv detailing LLM performance on a specific task. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →