llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s
The llama.cpp project has seen significant optimizations and fixes for the Qwen3.6/3.5-MTP models, with recent merges enhancing performance. Users are encouraged to share their benchmarks using the latest version, providing full command details for accurate comparisons. The goal is to gather optimized commands that yield the best tokens-per-second performance. AI
IMPACT Optimizations in llama.cpp may lead to faster local inference for Qwen models, benefiting users with limited hardware.