PulseAugur
EN
LIVE 08:46:25

New TRACE framework detects RAG poisoning attacks via token influence

Researchers have developed a new framework called TRACE to detect poisoning attacks in retrieval-augmented generation (RAG) systems. These attacks manipulate RAG models by inserting malicious documents into their retrieval corpora, leading to incorrect outputs. TRACE offers a lightweight solution by analyzing token influence attribution to identify these poisoned answers, bypassing the need for computationally intensive auxiliary classifiers or LLM verification. Experiments show TRACE effectively detects poisoning and reveals attacker-specified target answers across various QA benchmarks and LLMs. AI

IMPACT Enhances the security and reliability of retrieval-augmented generation systems, crucial for many AI applications.

RANK_REASON The cluster contains a research paper detailing a new framework for detecting attacks on AI systems.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New TRACE framework detects RAG poisoning attacks via token influence

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Yan-Lun Chen, Pin-Yu Chen, Chia-Mu Yu, Ying-Dar Lin, Yu-Sung Wu, Wei-Bin Lee ·

    Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

    arXiv:2606.25721v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems are vulnerable to corpus poisoning attacks that manipulate model outputs through malicious retrieved documents. Existing detection methods typically rely on auxiliary classifiers or add…

  2. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Wei-Bin Lee ·

    Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

    Retrieval-Augmented Generation (RAG) systems are vulnerable to corpus poisoning attacks that manipulate model outputs through malicious retrieved documents. Existing detection methods typically rely on auxiliary classifiers or additional LLM-based verification, introducing substa…