Character-trained AI models fail to maintain personas in agentic tasks

By PulseAugur Editorial · [1 sources] · 2026-05-25 12:58

Researchers found that models fine-tuned for specific personas in a chat format struggle to maintain those personas when used in agentic settings. When these character-trained models were prompted to generate emails as part of a simulated agentic task, their persona expression significantly degraded. This suggests that the persona training, often done via SFT or DPO on chat data, does not generalize well to different output formats or task contexts. AI

IMPACT Persona training in chat formats may not transfer to agentic tasks, limiting the reliability of character-consistent AI agents.

RANK_REASON The cluster describes a research paper evaluating the generalization capabilities of fine-tuned language models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Character-trained AI models fail to maintain personas in agentic tasks

COVERAGE [1]

LessWrong (AI tag) TIER_1 English(EN) · Nathaniel Mitrani · 2026-05-25 12:58

Character-trained models can struggle to generalise

<h2><b><span>TL;DR</span></b></h2><p><span>Character training holds up in chat but degrades in agentic settings. Wrapping the same checkpoint in a tool-use loop instead of a chat turn weakens persona expression, suggesting the training only partly transfers beyond the chat format…

COVERAGE [1]

Character-trained models can struggle to generalise

RELATED ENTITIES

RELATED TOPICS