qwen35: use post-norm hidden state for MTP by am17an · Pull Request #24025 · ggml-org/llama.cpp
A pull request has been submitted to the llama.cpp repository to optimize the Qwen35 model. The proposed change involves using a post-norm hidden state for the MTP (Multi-Turn Prompting) process. This modification aims to improve the model's inference speed. AI
IMPACT Potential for faster local inference of the Qwen35 model.