Researchers have developed a novel backdoor attack method called Paraesthesia for large language models, which leverages emotional style as a dynamic trigger. Unlike previous attacks that used static triggers, this method injects emotional cues into the fine-tuning data, causing the model to generate malicious outputs when encountering emotional inputs during inference. The attack reportedly achieves a near 99% success rate across various tasks and models while preserving the model's original utility. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT This research highlights a new vulnerability in LLMs, potentially impacting the security and trustworthiness of AI systems that rely on emotional context.
RANK_REASON Academic paper detailing a new method for attacking LLMs. [lever_c_demoted from research: ic=1 ai=1.0]