GAS-Leak-LLM: Genetic Algorithm-Based Suffix Optimization for Black-Box LLM Jailbreaking
Researchers have developed GAS-Leak-LLM, a new method for jailbreaking large language models (LLMs) using a genetic algorithm. This technique operates in a black-box setting, meaning it does not require access to the model's internal parameters. By iteratively applying genetic algorithm principles like selection, mutation, and crossover, the system evolves adversarial suffixes to bypass safety constraints and content moderation mechanisms. The findings highlight significant vulnerabilities in current LLM safety measures and demonstrate the practical viability of this attack. AI
IMPACT Demonstrates new vulnerabilities in LLM safety mechanisms, potentially requiring more robust alignment strategies.