PulseAugur
EN
LIVE 15:40:14

Gemma 4 31B model context window expanded to 80k tokens

A user on Reddit shared a method to significantly increase the context window size for the Gemma 4 31B model, expanding it from 35,000 to 80,000 tokens. This was achieved by modifying the `llama.cpp` configuration, specifically by enabling `--ctx-size 80000` and other related parameters like `--flash-attn on` and `GGML_CUDA_NO_PINNED=1`. The user also noted that this technique was previously reported for Deepseek models and has now been successfully applied to Gemma. AI

IMPACT Enables larger context windows for local LLM deployments, potentially improving performance on tasks requiring extensive information recall.

RANK_REASON User-driven modification of an existing model's parameters to enhance its capabilities, rather than a formal release or research paper.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Gemma 4 31B model context window expanded to 80k tokens

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 Dansk(DA) · /u/Defiant_Diet9085 ·

    RTX5090, gemma-4-31B-it-Q6_K.gguf. Context: before - 35k, after - 80k!

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1un6c4s/rtx5090_gemma431bitq6_kgguf_context_before_35k/"> <img alt="RTX5090, gemma-4-31B-it-Q6_K.gguf. Context: before - 35k, after - 80k!" src="https://external-preview.redd.it/MGeapHC9QI9slBEYy9lPkFxH_bYcjYu…