PulseAugur
EN
LIVE 14:50:35

New Dataset Extracts Drug Insights from Reddit

Researchers have developed ReDose, a dataset of 6,435 Reddit posts focused on substance use, to help physicians better understand real-world drug usage beyond clinical overdose cases. The dataset, annotated by a toxicologist and medical students, includes entities like DRUG, DOSE, and EFFECT. Benchmarking various models, BiomedBERT showed strong performance in DRUG entity extraction, while Llama-3 70B outperformed GPT-4 in overall extraction. The study highlights the ongoing challenge of accurately extracting EFFECT entities from user-generated content. AI

IMPACT Enhances LLM capabilities for specialized medical data extraction from social media, potentially improving drug safety and understanding.

RANK_REASON The cluster contains an academic paper detailing a new dataset and benchmark for entity extraction in the medical domain. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Dataset Extracts Drug Insights from Reddit

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Zewei Wang, Zihan Xu, Yishu Wei, Michael Chary, Yifan Peng ·

    Curation and Extraction of Drug-Related Entities from Reddit Platform

    arXiv:2605.26445v1 Announce Type: new Abstract: Physicians learn primarily about illicit drugs from clinical overdose cases, limiting their understanding of real-world usage. Meanwhile, drug users share first-hand experiences online, offering insights into dosage and effects of d…