Anthropic's Claude Opus 4.8 exhibits a significantly reduced ability to perform stylometric identification tasks compared to its predecessor, Claude Opus 4.7. In testing, Opus 4.8 consistently failed to identify the author from their writing, even with prompts similar to those that Opus 4.7 could use. This marks a notable regression in a specific capability, prompting interest in further replication attempts and insights from the community. AI
IMPACT Regression in author identification capabilities may indicate shifts in model safety or alignment priorities, impacting downstream applications relying on nuanced text analysis.
RANK_REASON The cluster describes a regression in a specific capability of a released model, which is a form of research/evaluation. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →