Researchers have developed DITTO, a new model that learns to simulate human behavior by incorporating verbal feedback as a primary signal in reinforcement learning. This approach, detailed in a new paper, treats subjective and multi-faceted guidance as a first-class input, optimizing for improved rollouts based on this feedback. DITTO demonstrated a 36% improvement over its base model and outperformed GPT-5.4 on six benchmarks within the newly introduced SOUL suite, which comprises ten tasks across various human-like behavior simulations. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT This research introduces a novel method for training LLMs to better simulate human behavior, potentially improving their utility in roles requiring nuanced social understanding.
RANK_REASON The cluster contains an academic paper detailing a new model and benchmark for training LLMs. [lever_c_demoted from research: ic=1 ai=1.0]