The Qwen 3.5-35B model, in its non-MTP version, has demonstrated the ability to play the open-source roguelike game Dungeon Crawl Stone Soup (DCSS) effectively. While the MTP version of Qwen exhibited issues with tool calls, the standard version performed well, even on smaller quantized models. This capability is being explored as a benchmark for LLM performance beyond traditional benchmarks, with the model successfully navigating game levels, defeating enemies, and managing inventory. AI
Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →
IMPACT Demonstrates LLM capability in complex, interactive environments, potentially leading to new benchmarking methods and applications beyond text generation.
RANK_REASON The cluster describes a model's performance in a non-standard application (playing a game) which can serve as a benchmark, fitting the research category. [lever_c_demoted from research: ic=1 ai=1.0]