Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 1w

mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released !

A new quantized model, Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF, has been released by mudler. This model is based on the APEX (Adaptive Precision for Expert Models) quantization technique and includes a multi-token prediction (MTP) head for self-speculative decoding. The MTP head is bundled directly into the GGUF file, simplifying its use with recent versions of llama.cpp. AI

IMPACT Enables local execution of advanced reasoning models with speculative decoding.

H200
llama.cpp
Blackwell
Claude 4.7 Opus
APEX
NVIDIA DGX Spark
Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF