PulseAugur
EN
LIVE 18:59:37

Qwen 3.5-35B plays Dungeon Crawl Stone Soup effectively

The Qwen 3.5-35B model, in its non-MTP version, has demonstrated the ability to play the open-source roguelike game Dungeon Crawl Stone Soup (DCSS) effectively. While the MTP version of Qwen exhibited issues with tool calls, the standard version performed well, even on smaller quantized models. This capability is being explored as a benchmark for LLM performance beyond traditional benchmarks, with the model successfully navigating game levels, defeating enemies, and managing inventory. AI

IMPACT Demonstrates LLM capability in complex, interactive environments, potentially leading to new benchmarking methods and applications beyond text generation.

RANK_REASON The cluster describes a model's performance in a non-standard application (playing a game) which can serve as a benchmark, fitting the research category. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 · /u/Sn0opY_GER ·

    Qwen Plays ̶p̶̶o̶̶k̶̶e̶̶m̶̶o̶̶n̶ ? / QWEN PLAYS DCSS! - qwen3.6-35b-a3b@q4_k_xl plays open source roguelike adventure DCSS (and does a decent job)

    <!-- SC_OFF --><div class="md"><p>Hi,</p> <p>(TLDR.): Qwen in its MTP version has tool call bugs and outputs everything into tool/thinking blocks - mangeling the output - canceling the +speed with repeated wrong tool calls! DCSS works well with non MTP qwen even on smaller qwants…