Brief · PulseAugur

TOOL · r/LocalLLaMA (AF) · 2w

How do I make MTP work in llama-server?

A user on Reddit is seeking assistance with implementing the "draft-mtp" (Multi-Turn Prompting) feature in the llama.cpp server. They have downloaded a specific model, Qwen3.6-35B-A3B-MTP-GGUF, and are attempting to run it with the MTP flag enabled. Initial benchmarks show a decrease in token generation speed when MTP is active, and the user is inquiring about potential causes and methods to improve the draft acceptance rate. AI

IMPACT Troubleshooting a specific feature in an open-source LLM inference tool, with potential performance improvements for users.

llama.cpp
unsloth
llama-server
3090
llama-benchy
Qwen3.6-35B-A3B-MTP-GGUF