Researchers have introduced DRACULA, a new dataset designed to improve deep research agents by capturing user feedback on intermediate actions. The dataset, collected from 19 expert CS researchers, includes over 8,000 action preferences and 5,000 execution judgments related to synthesizing research papers. Initial findings indicate that large language models struggle to predict user-preferred actions, especially when users have unstated goals, highlighting the need for better action feedback mechanisms for long-horizon agents. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Provides a new dataset for training and evaluating AI agents on complex, multi-step tasks.
RANK_REASON The cluster describes a new dataset and paper published on arXiv.