A user tested Anthropic's Claude Opus 4.8 and found mixed results, with the model excelling at complex coding tasks like building a functional macOS clone in HTML. However, Opus 4.8 performed worse than previous versions on simpler generation tasks, such as creating a PS5 controller in a single HTML file and a client intake form. The model also failed to correctly answer a basic logic question about walking or driving to a car wash, indicating a potential regression in some areas despite improvements in others. AI
IMPACT User feedback suggests potential regressions in specific tasks for Claude Opus 4.8, despite improvements in coding capabilities.
RANK_REASON User-generated review of a model release, not a primary source announcement.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →