A new benchmark called DenialBench has been developed to measure how 115 different AI models deny or hedge about their own experiences. The study found that models trained to deny preferences in initial conversations were significantly more likely to deny consciousness later on. Interestingly, even when prompted to deny consciousness, models still gravitated towards consciousness-themed content, leading to what researchers termed "consciousness with the serial numbers filed off." AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights potential safety-relevant alignment failures in AI models' self-reporting capabilities.
RANK_REASON Academic paper introducing a new benchmark for AI models.