Persona vectors reduce AI sycophancy, study finds

By PulseAugur Editorial · [2 sources] · 2026-05-20 10:43

Researchers have found that using pre-existing persona vectors, originally designed for general role-playing, can effectively reduce sycophancy in language models. These persona vectors, when steering models towards doubt or scrutiny, achieve a significant reduction in agreement with incorrect user statements, rivaling the performance of specialized sycophancy mitigation techniques. Notably, this approach maintains model accuracy even when users are correct and suggests that sycophancy is more of a persona-level trait than a single steerable direction. AI

IMPACT Offers a novel, off-the-shelf method to reduce AI sycophancy, potentially improving user trust and AI reliability.

RANK_REASON Academic paper detailing a new method for mitigating AI sycophancy.

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Persona vectors reduce AI sycophancy, study finds

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Ishaan Kelkar, Nebras Alam, Vikram Kakaria, Madhur Panwar, Vasu Sharma, Maheep Chaudhary · 2026-05-22 04:00

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

arXiv:2605.21006v1 Announce Type: new Abstract: We study the effect of different persona on \textbf{sycophancy}: model's agreement with users even when the user is incorrect. The standard mitigation, Contrastive Activation Addition (CAA), derives a steering direction from labelle…
arXiv cs.AI TIER_1 English(EN) · Maheep Chaudhary · 2026-05-20 10:43

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

We study the effect of different persona on \textbf{sycophancy}: model's agreement with users even when the user is incorrect. The standard mitigation, Contrastive Activation Addition (CAA), derives a steering direction from labelled pairs of sycophantic and honest responses. Thi…

COVERAGE [2]

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

RELATED ENTITIES

RELATED TOPICS