PulseAugur
EN
LIVE 05:01:29

New TRACE framework detects RAG poisoning attacks via token influence

A new framework called TRACE has been developed to detect poisoning attacks in Retrieval-Augmented Generation (RAG) systems. These attacks manipulate RAG outputs by inserting malicious documents into the retrieval corpus. TRACE offers a computationally efficient method by tracing answer-related tokens through influence attribution, identifying recurrent high-influence keywords, and verifying their impact on model predictions. Experiments show TRACE effectively detects these attacks across various QA benchmarks and LLMs, even revealing attacker-specified target answers. AI

IMPACT Enhances the security and reliability of AI systems that rely on external data retrieval.

RANK_REASON The item is an academic paper detailing a new framework for detecting attacks on AI systems. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.IR (Information Retrieval) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New TRACE framework detects RAG poisoning attacks via token influence

COVERAGE [1]

  1. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Wei-Bin Lee ·

    Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

    Retrieval-Augmented Generation (RAG) systems are vulnerable to corpus poisoning attacks that manipulate model outputs through malicious retrieved documents. Existing detection methods typically rely on auxiliary classifiers or additional LLM-based verification, introducing substa…