PulseAugur
EN
LIVE 23:42:45

Mudler releases Qwen3.6-35B model with Claude 4.7 Opus reasoning

A new quantized model, Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF, has been released by mudler. This model is based on the APEX (Adaptive Precision for Expert Models) quantization technique and includes a multi-token prediction (MTP) head for self-speculative decoding. The MTP head is bundled directly into the GGUF file, simplifying its use with recent versions of llama.cpp. AI

IMPACT Enables local execution of advanced reasoning models with speculative decoding.

RANK_REASON This is a release of a quantized model, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/PhotographerUSA ·

    mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released !

    <!-- SC_OFF --><div class="md"><p>Description of the module: </p> <p>I host <strong>30+ free APEX MoE quantizations</strong> as independent research. My only local hardware is an <strong>NVIDIA DGX Spark</strong> (122 GB unified memory) — enough for ~30-50B-class MoEs, but <stron…