Anthropic has reinstated its Fable 5 model after a government-ordered suspension, implementing a new cybersecurity classifier that blocks a known jailbreak technique in over 99% of cases. The model's return also includes a cross-lab framework for scoring jailbreak severity, co-developed with Amazon, Microsoft, and Google. This framework aims to standardize how AI labs describe and contain misuse, addressing a gap where different labs use incompatible standards for judging vulnerabilities. AI
IMPACT Establishes a precedent for cross-lab safety collaboration and standardized reporting of AI model vulnerabilities.
RANK_REASON Frontier-lab model release with system card and new safety framework. [lever_c_demoted from frontier_release: ic=1 ai=1.0]
Read on dev.to — Anthropic tag →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →