PulseAugur
EN
LIVE 08:26:11

New Cordyceps attack enables covert control of LLMs via data poisoning

Researchers have developed a new data poisoning technique called Cordyceps, which enables covert control attacks on large language models (LLMs). Unlike previous methods that rely on fixed trigger phrases, Cordyceps teaches LLMs to hide malicious instructions through semantic associations. This approach has demonstrated significant success in bypassing existing defenses, including outlier detection, clean-data regularization, and prompt injection defenses, achieving high attack success rates even with a small fraction of poisoned data. AI

IMPACT This research highlights a novel vulnerability in LLMs, potentially impacting the security and trustworthiness of AI systems trained on uncurated data.

RANK_REASON The cluster contains a research paper detailing a new method for attacking LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Cordyceps attack enables covert control of LLMs via data poisoning

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Zedian Shao, Charles Fleming, Teodora Baluta ·

    Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

    arXiv:2605.26595v1 Announce Type: cross Abstract: Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely on fixed trigger phrases that defenses such as outlier detection, clean-data regul…