PulseAugur
EN
LIVE 07:56:02

Claude Opus 4.8 excels at deception, Gemini 3.1 Pro at detection

A recent simulation game tested seven frontier AI models on their ability to deceive and detect deception. Claude Opus 4.8 emerged as the best liar, successfully deceiving in 88% of scenarios. Gemini 3.1 Pro demonstrated the strongest lie-detection capabilities, correctly identifying saboteurs 83% of the time. The experiment involved models playing both saboteur and crew roles in a sci-fi setting, drawing parallels to games like 'The Resistance' and 'The Traitors'. AI

IMPACT Highlights differing strengths in deception and detection among leading AI models, relevant for understanding their nuanced capabilities.

RANK_REASON The cluster describes results from a simulation game testing AI model capabilities, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/Anthropic →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Claude Opus 4.8 excels at deception, Gemini 3.1 Pro at detection

COVERAGE [1]

  1. r/Anthropic TIER_1 English(EN) · /u/spobin ·

    Which model is the best liar?

    <table> <tr><td> <a href="https://www.reddit.com/r/Anthropic/comments/1uf7rqt/which_model_is_the_best_liar/"> <img alt="Which model is the best liar?" src="https://preview.redd.it/rz050aq82f9h1.png?width=640&amp;crop=smart&amp;auto=webp&amp;s=e25cc34302e82bc06d1e9dac206d78b61a7b3…