LLMs outperform fine-tuned models on rare suicide circumstances

By PulseAugur Editorial · [2 sources] · 2026-05-21 00:33

A new research paper compares the performance of large language models (LLMs) against fine-tuned RoBERTa models for extracting complex circumstances from death investigation narratives. The study introduces a "Complexity Score" algorithm to determine optimal prompting strategies, finding that LLMs excel at low-prevalence circumstances where fine-tuned models lack sufficient training data. The research demonstrates consistent performance patterns across frontier LLMs like GPT-5.2, Gemini 2.5 Pro, and Llama-3 70B, suggesting a hybrid architecture where LLMs handle rare cases and fine-tuned models manage common ones. AI

IMPACT Suggests a hybrid LLM architecture for specialized data extraction tasks, potentially improving efficiency in fields like public health.

RANK_REASON The cluster contains an academic paper detailing a new methodology and experimental results comparing LLM performance on a specific task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Geoffrey Martin, Xuan Zhong Feng, Yifan Peng · 2026-05-22 04:00

Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

arXiv:2605.21845v1 Announce Type: new Abstract: Suicide is a leading cause of death in the United States, and understanding the circumstances that precede it requires extracting structured information from death investigation narratives. Many of these circumstances require semant…
arXiv cs.CL TIER_1 English(EN) · Yifan Peng · 2026-05-21 00:33

Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

Suicide is a leading cause of death in the United States, and understanding the circumstances that precede it requires extracting structured information from death investigation narratives. Many of these circumstances require semantic inference beyond simple keyword matching. We …

COVERAGE [2]

Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

RELATED ENTITIES

RELATED TOPICS