Ollama v0.31.1 improves Gemma 4 MoE model loading

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:15

Ollama has released version 0.31.1, which includes improvements to the loading of Gemma 4 Mixture of Experts (MoE) models. This update allows for more flexible loading of both quantized and non-quantized versions of these models by standardizing tensor naming conventions. AI

IMPACT Enhances the usability and flexibility of running advanced AI models like Gemma 4 MoE on local hardware.

RANK_REASON This is a software release for a tool that facilitates running AI models locally, not a core AI model release or research.

Read on Ollama — Releases →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Ollama v0.31.1 improves Gemma 4 MoE model loading

COVERAGE [1]

Ollama — Releases TIER_1 Nederlands(NL) · pdevine · 2026-06-30 04:15

v0.31.1: mlx: tighten up gemma4 moe loading code (#16964)

<p>This change allows .experts.gate_proj / .up_proj / .down_proj tensor names to each<br /> be used for both quantized (i.e. nvfp4 and mxfp8) and non-quantized (bf16) models.<br /> Previous to this only non-quantized models used that tensor naming scheme.</p>

COVERAGE [1]

v0.31.1: mlx: tighten up gemma4 moe loading code (#16964)

RELATED ENTITIES

RELATED TOPICS