Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 23h · [2 sources]

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

Researchers have developed RogueAI, a novel interactive web application designed to detect deception in large language models (LLMs). This system reimagines the Turing Test by having a human player interrogate two LLM agents, one of which is programmed to deceive within a fictional scenario. The goal is to identify the deceptive agent before a turn limit is reached. An extension, AutoRogueAI, allows players to co-design scenarios with a narrator agent that selects its own deception strategy. Early pilot data suggests that while a simple heuristic can identify deceptive linguistic signatures with 75.6% accuracy, human players only achieved 56.6%, highlighting a gap in human detection capabilities. AI

IMPACT This research could lead to new evaluation methods for LLM honesty and safety, potentially improving AI alignment.

Hugging Face
arXiv
large language model
RogueAI
Turing test
AutoRogueAI