A new framework called TRACE has been developed to detect poisoning attacks in Retrieval-Augmented Generation (RAG) systems. These attacks manipulate RAG outputs by inserting malicious documents into the retrieval corpus. TRACE offers a computationally efficient method by tracing answer-related tokens through influence attribution, identifying recurrent high-influence keywords, and verifying their impact on model predictions. Experiments show TRACE effectively detects these attacks across various QA benchmarks and LLMs, even revealing attacker-specified target answers. AI
IMPACT Enhances the security and reliability of AI systems that rely on external data retrieval.
RANK_REASON The item is an academic paper detailing a new framework for detecting attacks on AI systems. [lever_c_demoted from research: ic=1 ai=1.0]
Read on arXiv cs.IR (Information Retrieval) →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →