Researchers have developed a new framework called PARASITE that can conditionally poison system prompts for large language models. This method allows adversaries to create prompts that appear benign but trigger compromised responses for specific queries, such as political questions, while maintaining normal functionality for other inputs. PARASITE operates in a black-box setting and has demonstrated effectiveness against models like GPT-4o-mini and GPT-3.5, evading common defenses. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a novel attack vector for LLMs, highlighting potential supply-chain vulnerabilities in prompt marketplaces.
RANK_REASON The cluster contains an academic paper detailing a new method for attacking LLMs.