PulseAugur
EN
LIVE 05:41:43

Anthropic reinstates Fable 5 with new safeguards and cross-lab jailbreak standard

Anthropic has reinstated its Fable 5 model after a government-ordered suspension, implementing a new cybersecurity classifier that blocks a known jailbreak technique in over 99% of cases. The model's return also includes a cross-lab framework for scoring jailbreak severity, co-developed with Amazon, Microsoft, and Google. This framework aims to standardize how AI labs describe and contain misuse, addressing a gap where different labs use incompatible standards for judging vulnerabilities. AI

IMPACT Establishes a precedent for cross-lab safety collaboration and standardized reporting of AI model vulnerabilities.

RANK_REASON Frontier-lab model release with system card and new safety framework. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on dev.to — Anthropic tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Anthropic reinstates Fable 5 with new safeguards and cross-lab jailbreak standard

COVERAGE [1]

  1. dev.to — Anthropic tag TIER_1 English(EN) · Breach Protocol ·

    Anthropic Reinstates Its Top Model With New Cyber Safeguards and a Cross-Lab Jailbreak Standard

    <p>Anthropic has brought its Fable 5 model back online after a brief government-ordered suspension, pairing the return with a new cybersecurity classifier that it says blocks a known jailbreak in more than 99% of cases and a jailbreak-severity framework co-developed with Amazon, …