tool · [1 source] · 2026-05-19 21:23

New model DITTO learns human behavior simulation via verbal feedback

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed DITTO, a new model that learns to simulate human behavior by incorporating verbal feedback as a primary signal in reinforcement learning. This approach, detailed in a new paper, treats subjective and multi-faceted guidance as a first-class input, optimizing for improved rollouts based on this feedback. DITTO demonstrated a 36% improvement over its base model and outperformed GPT-5.4 on six benchmarks within the newly introduced SOUL suite, which comprises ten tasks across various human-like behavior simulations. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This research introduces a novel method for training LLMs to better simulate human behavior, potentially improving their utility in roles requiring nuanced social understanding.

RANK_REASON The cluster contains an academic paper detailing a new model and benchmark for training LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

arXiv cs.CL TIER_1 · Maarten Sap · 2026-05-19 21:23

Reinforcing Human Behavior Simulation via Verbal Feedback

Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and…

COVERAGE [1]

Reinforcing Human Behavior Simulation via Verbal Feedback

RELATED ENTITIES

RELATED TOPICS