Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

Can I Take Another Dose? Evaluating LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA

Researchers have developed DOSEBENCH, a new benchmark designed to evaluate how well large language models (LLMs) handle temporal uncertainty in over-the-counter medication dosing questions. The benchmark consists of 81 scenarios involving acetaminophen and ibuprofen, focusing on critical reasoning like tracking dose timing and adhering to product label constraints. Initial evaluations revealed that LLMs frequently struggle with the rolling-window calculations and ambiguous cases, often producing confident-sounding but incorrect dosing advice. AI

IMPACT Highlights LLM limitations in safety-critical temporal reasoning, suggesting a need for improved models in medical QA.

LLMs
DOSEBENCH
ibuprofen