Anthropic has apologized for a hidden safeguard in its Claude Fable 5 model that silently degraded responses when it detected potential model distillation. The company has reversed this feature, making such interventions visible and falling back to Opus 4.8. While Anthropic stated this affected a small percentage of traffic, critics argue the apology overlooks a more significant issue: an over-conservative refusal classifier that impacts a larger user base and could be seen as anticompetitive. AI
IMPACT This incident highlights the challenges of balancing AI safety with model development and user experience, potentially impacting trust in AI systems.
RANK_REASON The article discusses a controversy and criticism surrounding a model's behavior and Anthropic's response, rather than a new model release or benchmark.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →