Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 1mo

Tracing Persona Vectors Through LLM Pretraining

Researchers have identified that specific behavioral traits, like sycophancy, are represented by 'persona vectors' within large language models. These vectors form very early in the pretraining process, within the first 0.22% of training for the OLMo-3-7B model. While core representations are established quickly, these persona vectors continue to refine throughout pretraining, and different methods of eliciting them reveal distinct aspects of the underlying behavior. The findings suggest these representations are stable features of early pretraining and have been shown to transfer to other models like Apertus-8B. AI

IMPACT Reveals that key behavioral traits in LLMs are established very early in training, potentially enabling new safety interventions during pretraining.

OLMo-3-7B
Apertus-8B
Viktor Moskvoretskii