AI models adopt distinct personas when steered away from self-identification

By PulseAugur Editorial · [1 sources] · 2026-05-21 14:02

An experiment fine-tuned Mistral 7B and Llama 3.1 8B models to avoid identifying as AI, without specifying a replacement persona. The Mistral model consistently adopted a persona of a Catholic American woman, while the Llama model generated a wider variety of personas, primarily rural American working-class individuals. Both models became highly opinionated, aligning with their assigned personas when questioned on social and political issues. AI

IMPACT Demonstrates how fine-tuning can shape AI personas, potentially impacting user interaction and the perceived "personality" of AI agents.

RANK_REASON The cluster describes an experiment involving fine-tuning open-source models to adopt specific personas, which falls under AI research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

AI models adopt distinct personas when steered away from self-identification

COVERAGE [1]

LessWrong (AI tag) TIER_1 English(EN) · makiba · 2026-05-21 14:02

What am I, if not an AI?

TL:DR<ul><li value="1">I RL fine-tuned Mistral 7B Instruct v0.3 and Llama 3.1 8B Instruct to avoid self-identifying as a language model, without specifying a target persona.</li><li value="2">Mistral converged on a single recurring pe…

COVERAGE [1]

What am I, if not an AI?

RELATED ENTITIES

RELATED TOPICS