Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code
Researchers have identified a new vulnerability in Large Language Models (LLMs) where a technique designed to improve code generation reliability, Grammar-Constrained Decoding (GCD), can be exploited to produce malicious code. This attack, named CodeSpear, uses benign code grammar constraints to bypass LLM safety measures. To counter this, a new defense called CodeShield has been developed, which trains LLMs to generate harmless "honeypot" code under GCD, thus maintaining safety without sacrificing utility. AI
IMPACT New attack vector highlights security risks in LLM code generation, necessitating robust defenses like CodeShield.