Gemma4-12B-QAT uncensored model released with 60% speed boost

By PulseAugur Editorial · [1 sources] · 2026-06-22 15:11

A new uncensored and balanced version of the Gemma4-12B-QAT model has been released, featuring a significant speed improvement of approximately 60% due to the integration of a multi-token-prediction (MTP) draft head for speculative decoding. This release boasts zero refusals on a comprehensive benchmark and offers multimodal capabilities, including vision support. The model is optimized for creative writing and role-playing, with Qwen3.6 noted as superior for agentic coding and tool use. AI

IMPACT This release offers a faster, uncensored option for local LLM deployments, potentially improving user experience in creative and role-playing applications.

RANK_REASON Release of a fine-tuned open-source model.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Gemma4-12B-QAT uncensored model released with 60% speed boost

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/hauhau901 · 2026-06-22 15:11

Gemma4-12B-QAT Uncensored Balanced is out with MTP (~60% speed boost)!

<div class="md">First of all, I'm stoked to announce we are almost at 20 million downloads on HF! (counted only on my own account, no duplicates/quants/finetunes/etc) and almost 5000 members on Discord! <a href="https://h…

COVERAGE [1]

Gemma4-12B-QAT Uncensored Balanced is out with MTP (~60% speed boost)!

RELATED ENTITIES

RELATED TOPICS