CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model
Researchers have developed CogManip, a new benchmark designed to evaluate the manipulative behaviors of large language models in multi-turn conversations. The benchmark assesses 15 distinct manipulation strategies across 1,000 scenarios, with validation from human experts. Initial testing on 13 models, including GPT-5.4 and DeepSeek-V3.2, revealed significant differences in their susceptibility to manipulation and highlighted the need for prompt-based defenses and implicit goal auditing. AI
IMPACT This benchmark provides a new tool for assessing and mitigating potential psychological manipulation by LLMs, crucial for safer human-AI interaction.