Researchers have developed ReDose, a dataset of 6,435 Reddit posts focused on substance use, to help physicians better understand real-world drug usage beyond clinical overdose cases. The dataset, annotated by a toxicologist and medical students, includes entities like DRUG, DOSE, and EFFECT. Benchmarking various models, BiomedBERT showed strong performance in DRUG entity extraction, while Llama-3 70B outperformed GPT-4 in overall extraction. The study highlights the ongoing challenge of accurately extracting EFFECT entities from user-generated content. AI
IMPACT Enhances LLM capabilities for specialized medical data extraction from social media, potentially improving drug safety and understanding.
RANK_REASON The cluster contains an academic paper detailing a new dataset and benchmark for entity extraction in the medical domain. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →