PulseAugur
EN
LIVE 17:01:17

Gemma4-12B-QAT uncensored model released with 60% speed boost

A new uncensored and balanced version of the Gemma4-12B-QAT model has been released, featuring a significant speed improvement of approximately 60% due to the integration of a multi-token-prediction (MTP) draft head for speculative decoding. This release boasts zero refusals on a comprehensive benchmark and offers multimodal capabilities, including vision support. The model is optimized for creative writing and role-playing, with Qwen3.6 noted as superior for agentic coding and tool use. AI

IMPACT This release offers a faster, uncensored option for local LLM deployments, potentially improving user experience in creative and role-playing applications.

RANK_REASON Release of a fine-tuned open-source model.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Gemma4-12B-QAT uncensored model released with 60% speed boost

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/hauhau901 ·

    Gemma4-12B-QAT Uncensored Balanced is out with MTP (~60% speed boost)!

    <!-- SC_OFF --><div class="md"><p>First of all, I'm stoked to announce <strong>we are almost at 20 million downloads on HF!</strong> (counted only on my own account, no duplicates/quants/finetunes/etc) <strong>and almost 5000 members on Discord!</strong></p> <p><a href="https://h…