Mudler releases Qwen3.6-35B model with Claude 4.7 Opus reasoning

By PulseAugur Editorial · [1 sources] · 2026-05-31 05:05

A new quantized model, Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF, has been released by mudler. This model is based on the APEX (Adaptive Precision for Expert Models) quantization technique and includes a multi-token prediction (MTP) head for self-speculative decoding. The MTP head is bundled directly into the GGUF file, simplifying its use with recent versions of llama.cpp. AI

IMPACT Enables local execution of advanced reasoning models with speculative decoding.

RANK_REASON This is a release of a quantized model, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/PhotographerUSA · 2026-05-31 05:05

mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released !

<div class="md">Description of the module: I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) — enough for ~30-50B-class MoEs, but <stron…

COVERAGE [1]

mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released !

RELATED ENTITIES

RELATED TOPICS