PulseAugur
EN
LIVE 20:13:07

llama.cpp users share benchmarks for optimized Qwen3.6/3.5-MTP models

The llama.cpp project has seen significant optimizations and fixes for the Qwen3.6/3.5-MTP models, with recent merges enhancing performance. Users are encouraged to share their benchmarks using the latest version, providing full command details for accurate comparisons. The goal is to gather optimized commands that yield the best tokens-per-second performance. AI

IMPACT Optimizations in llama.cpp may lead to faster local inference for Qwen models, benefiting users with limited hardware.

RANK_REASON User-generated benchmarks and discussion of optimizations for open-source models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/pmttyji ·

    llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s

    <!-- SC_OFF --><div class="md"><p>I think the dust has settled(95+%) for Qwen3.6/3.5-MTP. After the initial PR, so much optimizations &amp; fixes. Even sometime ago today, there's a MTP related PR got merged &amp; released(<a href="https://github.com/ggml-org/llama.cpp/releases/t…