READER: Robust Evidence-based Authorship Decoding via Extracted Representations
Researchers have developed READER, a new framework for identifying which Large Language Model (LLM) generated a given text, even when prompts vary. This method uses a frozen proxy LLM to analyze activation spaces and accumulate evidence across multiple responses. READER achieves significant accuracy, outperforming previous methods and demonstrating that stronger LLMs possess more decodable authorship structures. AI
IMPACT Establishes a new method for LLM provenance, crucial for verifying AI-generated content in agentic applications.