DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation
Researchers have developed DiscourseFlip, a novel attack method targeting retrieval-augmented generation (RAG) systems. This attack manipulates opinions across a network of related queries, going beyond single-query attacks to induce broader shifts. Experiments show DiscourseFlip is effective at altering opinions and remains well-camouflaged, while existing defenses are insufficient. AI
IMPACT Highlights new vulnerabilities in RAG systems, necessitating improved defenses against sophisticated, multi-topic manipulation.