PulseAugur
EN
LIVE 16:35:48

Anthropic's Opus 4.8 shows significant decline in accuracy

Users are reporting significant issues with Anthropic's Opus 4.8 model, noting that it provides incorrect code suggestions nearly half the time. One user has resorted to ignoring many of the model's spontaneous suggestions due to these inaccuracies. This indicates a potential decline in the model's reliability and performance. AI

IMPACT Potential decline in Opus 4.8's reliability could impact user trust and adoption.

RANK_REASON User reports of model inaccuracy, not a direct release or benchmark.

Read on Mastodon — fosstodon.org →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Opus 4.8 is incorrect almost 50% of the time now: "On your new request — you’re right not to take my code-read on faith. Let me empirically verify…" # ai # opus

    Opus 4.8 is incorrect almost 50% of the time now: "On your new request — you’re right not to take my code-read on faith. Let me empirically verify…" # ai # opus48

  2. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    I now ignore a lot of Opus 4.8 spontaneous suggestions: "So my earlier “hypothesis 2” (broadcast auto-adds a peer → verify-probe TX → knocks RX out) was wrong —

    I now ignore a lot of Opus 4.8 spontaneous suggestions: "So my earlier “hypothesis 2” (broadcast auto-adds a peer → verify-probe TX → knocks RX out) was wrong — there’s no such mechanism, and I shouldn’t have floated it. Good catch; scratch that hypothesis." # ai # opus48