Researchers have developed PAST2HARM, a novel jailbreaking technique that exploits a vulnerability in multimodal AI systems by reformulating prompts into the past tense. This method systematically bypasses refusal training in text-to-image models, demonstrating high attack success rates across various models like Gemini Nano Banana Pro, GPT Image 2, and SD XL. The attack can elicit a range of harmful content, including explicit material, disinformation, and hate speech, highlighting fundamental weaknesses in current AI safety measures. AI
IMPACT Highlights critical vulnerabilities in multimodal AI safety, necessitating improved defenses against sophisticated jailbreaking techniques.
RANK_REASON Research paper detailing a new attack method against multimodal AI systems. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →