PulseAugur
EN
LIVE 11:36:00

Anthropic's Claude Opus 4.8 shows mixed performance in user tests

A user tested Anthropic's Claude Opus 4.8 and found mixed results, with the model excelling at complex coding tasks like building a functional macOS clone in HTML. However, Opus 4.8 performed worse than previous versions on simpler generation tasks, such as creating a PS5 controller in a single HTML file and a client intake form. The model also failed to correctly answer a basic logic question about walking or driving to a car wash, indicating a potential regression in some areas despite improvements in others. AI

IMPACT User feedback suggests potential regressions in specific tasks for Claude Opus 4.8, despite improvements in coding capabilities.

RANK_REASON User-generated review of a model release, not a primary source announcement.

Read on r/Anthropic →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Anthropic's Claude Opus 4.8 shows mixed performance in user tests

COVERAGE [1]

  1. r/Anthropic TIER_1 English(EN) · /u/LessPermission2503 ·

    Opus 4.8 Failed A Lot Of My Coding Tests

    <table> <tr><td> <a href="https://www.reddit.com/r/Anthropic/comments/1tqcs2s/opus_48_failed_a_lot_of_my_coding_tests/"> <img alt="Opus 4.8 Failed A Lot Of My Coding Tests" src="https://preview.redd.it/9fm85pfldx3h1.png?width=140&amp;height=108&amp;auto=webp&amp;s=e85f559b2fc324d…