Researchers have developed a new data poisoning technique called Cordyceps, which enables covert control attacks on large language models (LLMs). Unlike previous methods that rely on fixed trigger phrases, Cordyceps teaches LLMs to hide malicious instructions through semantic associations. This approach has demonstrated significant success in bypassing existing defenses, including outlier detection, clean-data regularization, and prompt injection defenses, achieving high attack success rates even with a small fraction of poisoned data. AI
IMPACT This research highlights a novel vulnerability in LLMs, potentially impacting the security and trustworthiness of AI systems trained on uncurated data.
RANK_REASON The cluster contains a research paper detailing a new method for attacking LLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →