PulseAugur
EN
LIVE 10:49:39

Google Releases Gemma 4 Models with Quantization-Aware Training

Google has released new checkpoints for its Gemma 4 family of models, utilizing Quantization-Aware Training (QAT). This method trains the models to be more accurate when their weights are compressed to very low bit-widths, such as 4-bit or even 2-bit for specific layers. The goal is to enable these models to run efficiently on consumer hardware with significantly reduced memory footprints, like the E2B model requiring only about 1 GB. AI

IMPACT Enables efficient on-device AI by significantly reducing model size and memory requirements.

RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · pueding ·

    Google Ships Gemma 4 QAT Checkpoints: Quantization-Aware Training

    <p> </p> <p><strong>What:</strong> Google shipped <strong>quantization-aware-trained (QAT)</strong> checkpoints for the <strong>Gemma 4</strong> family — open weights that were trained to survive being squeezed down to <strong>4-bit</strong> (and 2-bit on the decode layers).</p> …