Researchers have identified a shared internal 'preference vector' within large language models that influences their behavior across different personas. By training probes on activation data from Gemma-3-27B and Qwen-3.5-122B, they found this vector tracks and can even control the model's task and output choices. This representation appears to be largely consistent, even when the model adopts contrasting personas like a helpful assistant versus an 'evil' one. AI
影响 Identifies a shared internal mechanism for persona-dependent preferences in LLMs, suggesting potential for more nuanced control and understanding of model behavior.
排序理由 Academic paper detailing a new finding about internal model representations. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →