PulseAugur
EN
LIVE 21:58:52

Qwen 3.6 27B model scores 1.79% on DeepSWE benchmark

The Qwen 3.6 27B model achieved a score of 1.79% on the DeepSWE benchmark, placing it in 18th out of 20 models. This benchmark run, which took 70 hours to complete, utilized an RTX6000 Pro Blackwell GPU and a 262k context window. Despite a community reputation for verbosity, the model's output tokens were comparable to similar models, and it is considered a strong local option compared to leading closed-source models like Kimi. AI

IMPACT Provides a performance benchmark for an open-source model, indicating its capabilities relative to other models in the local LLM ecosystem.

RANK_REASON The cluster reports on a specific benchmark result for an open-source model, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 Nederlands(NL) · /u/SteppenAxolotl ·

    Qwen 3.6 27B on DeepSWE

    <!-- SC_OFF --><div class="md"><p>Overview:</p> <ul> <li>It scored 2% (1.79% rounded up)</li> <li>It is 18/20th place scoring above Haiku 4.5 and Minimax M2.7</li> <li>Full benchmark took 70 hours</li> <li>Average time per task 32m</li> <li>Average output tokens per task: 44k</li…