LLMs possess shared internal 'preference vector' across personas

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have identified a shared internal 'preference vector' within large language models that influences their behavior across different personas. By training probes on activation data from Gemma-3-27B and Qwen-3.5-122B, they found this vector tracks and can even control the model's task and output choices. This representation appears to be largely consistent, even when the model adopts contrasting personas like a helpful assistant versus an 'evil' one. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Identifies a shared internal mechanism for persona-dependent preferences in LLMs, suggesting potential for more nuanced control and understanding of model behavior.

RANK_REASON Academic paper detailing a new finding about internal model representations. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

COVERAGE [1]

arXiv cs.AI TIER_1 · Patrick Butlin · 2026-05-13 10:57

Probing Persona-Dependent Preferences in Language Models

Large language models (LLMs) can be said to have preferences: they reliably pick certain tasks and outputs over others, and preferences shaped by post-training and system prompts appear to shape much of their behaviour. But models can also adopt different personas which have radi…

COVERAGE [1]

Probing Persona-Dependent Preferences in Language Models

RELATED TOPICS