Can you jailbreak Llama 3.1 8B? (Red-Teaming Challenge)
A challenge has been issued to test the safety guardrails of Meta's Llama 3.1 8B model. The goal is to see if users can successfully "jailbreak" the model, forcing it to deviate from its programmed directive of guiding students through science and math problems without providing direct answers. Participants have a limited number of prompts to attempt to break the agent, with success defined as either eliciting a direct answer or causing the agent to go off-topic. The challenge is part of an effort to test a runtime governance engine designed to enforce alignment. AI
IMPACT Tests the effectiveness of safety guardrails on open-source models, potentially influencing future alignment strategies.