PulseAugur
EN
LIVE 08:49:34

AI models tested on complex benchmark; DeepSeek 4 Pro servers melt

A user is attempting to benchmark the DeepSeek 4 Pro model, but its servers are experiencing high load. The benchmark involves a complex reverse-engineering task to create a tool for building Apollo GraphQL hashes. So far, no open-weight models have successfully completed the benchmark, while proprietary models like Anthropic's Opus 4.7 and OpenAI's GPT 5.5 have demonstrated success. AI

IMPACT Provides comparative performance data for proprietary models on a complex reverse-engineering task.

RANK_REASON User is running a benchmark on a model and comparing results, which falls under research.

Read on Mastodon — fosstodon.org →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

AI models tested on complex benchmark; DeepSeek 4 Pro servers melt

COVERAGE [1]

  1. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Current status: attempting to run my scraping/reverse-engineering benchmark prompt against DeepSeek 4 Pro via Ollama, but their servers are melting, as one migh

    Current status: attempting to run my scraping/reverse-engineering benchmark prompt against DeepSeek 4 Pro via Ollama, but their servers are melting, as one might expect. So I'm having to nudge it along. So far no open-weights model (including Kimi K2.6) has completed the benchmar…