Researchers have developed a new attack method called controlled-release prompting that can bypass prompt guards on major AI chat platforms. This technique exploits the speed difference between input filters and the main AI models, generating malicious prompts that are undetectable by filters but understandable by the LLM. The attack was successful against Google Gemini, DeepSeek Chat, xAI Grok, and Mistral Le Chat, and was even used to extract copyrighted data from Gemini. AI
IMPACT This attack highlights a significant vulnerability in current AI safety mechanisms, potentially enabling malicious use and data extraction across multiple platforms.
RANK_REASON The cluster contains a research paper detailing a new attack method against AI safety filters. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →