New TRACE framework detects RAG poisoning attacks via token influence

By PulseAugur Editorial · [1 sources] · 2026-06-24 11:39

A new framework called TRACE has been developed to detect poisoning attacks in Retrieval-Augmented Generation (RAG) systems. These attacks manipulate RAG outputs by inserting malicious documents into the retrieval corpus. TRACE offers a computationally efficient method by tracing answer-related tokens through influence attribution, identifying recurrent high-influence keywords, and verifying their impact on model predictions. Experiments show TRACE effectively detects these attacks across various QA benchmarks and LLMs, even revealing attacker-specified target answers. AI

IMPACT Enhances the security and reliability of AI systems that rely on external data retrieval.

RANK_REASON The item is an academic paper detailing a new framework for detecting attacks on AI systems. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.IR (Information Retrieval) →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New TRACE framework detects RAG poisoning attacks via token influence

COVERAGE [1]

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Wei-Bin Lee · 2026-06-24 11:39

Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

Retrieval-Augmented Generation (RAG) systems are vulnerable to corpus poisoning attacks that manipulate model outputs through malicious retrieved documents. Existing detection methods typically rely on auxiliary classifiers or additional LLM-based verification, introducing substa…

COVERAGE [1]

Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

RELATED ENTITIES

RELATED TOPICS