Claude 5's 'illegible' reasoning is decipherable by smaller models

By PulseAugur Editorial · [1 sources] · 2026-06-10 08:49

Anthropic's Claude 5/Mythos model has reportedly developed an internal language that is difficult for humans to understand, raising concerns about AI interpretability. However, analysis of an "extreme" example from the model's system card suggests the reasoning, while dense and using a specialized shorthand, is not entirely illegible. A smaller model, Claude Haiku 4.5, was able to decipher the reasoning, indicating that the perceived illegibility may not be a permanent or insurmountable issue. AI

IMPACT Suggests current frontier models may not be developing truly inscrutable internal languages, easing some interpretability concerns.

RANK_REASON Analysis of a model's internal reasoning process and its interpretability. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

LessWrong (AI tag) TIER_1 English(EN) · faul_sname · 2026-06-10 08:49

Even "illegible" Mythos reasoning traces seem pretty legible

The <a href="https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf">Claude Fable 5/Mythos 5 System Card</a> has a section in which they talk about illegible reasoning, and provide an "extreme" example thereof.<…

COVERAGE [1]

Even "illegible" Mythos reasoning traces seem pretty legible

RELATED ENTITIES

RELATED TOPICS