Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

"Did you lie?" Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms

Researchers have developed new methods to evaluate lie detectors for language models, addressing the challenge that existing testbeds often fail to ensure models genuinely believe the opposite of what they state. The study introduces 13 reasoning model organisms with verified hidden beliefs and a prompted-lying testbed called Varied Deception. Across 31 open-weight models, detectors showed scaling with model capability on prompted lying, but activation- and logprob-based methods struggled with the trained model organisms. The chain-of-thought judge performed best, though partly due to verification methods. AI

IMPACT New evaluation methods and datasets for AI lie detection could improve model auditing and safety research.

Varied Deception
Did-You-Lie (DYL)