PulseAugur
实时 04:56:26

AI models tested on complex benchmark; DeepSeek 4 Pro servers melt

A user is attempting to benchmark the DeepSeek 4 Pro model, but its servers are experiencing high load. The benchmark involves a complex reverse-engineering task to create a tool for building Apollo GraphQL hashes. So far, no open-weight models have successfully completed the benchmark, while proprietary models like Anthropic's Opus 4.7 and OpenAI's GPT 5.5 have demonstrated success. AI

影响 Provides comparative performance data for proprietary models on a complex reverse-engineering task.

排序理由 User is running a benchmark on a model and comparing results, which falls under research.

在 Mastodon — fosstodon.org 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

AI models tested on complex benchmark; DeepSeek 4 Pro servers melt

报道来源 [1]

  1. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Current status: attempting to run my scraping/reverse-engineering benchmark prompt against DeepSeek 4 Pro via Ollama, but their servers are melting, as one migh

    Current status: attempting to run my scraping/reverse-engineering benchmark prompt against DeepSeek 4 Pro via Ollama, but their servers are melting, as one might expect. So I'm having to nudge it along. So far no open-weights model (including Kimi K2.6) has completed the benchmar…