PulseAugur
EN
LIVE 20:41:39

llama.cpp gains 28% context with OpenBLAS build

A user on Reddit's r/LocalLLaMA subreddit has discovered that compiling the llama.cpp software with OpenBLAS support, in addition to Vulkan, allows for a significant increase in context window size. When using the Qwen 3.6 27B model, the context window expanded from approximately 87,808 tokens to 112,896 tokens. The user is investigating whether this is expected behavior, a bug, or an anomaly. AI

IMPACT Potential for increased context window efficiency in local LLM deployments.

RANK_REASON User-discovered optimization for open-source LLM inference software. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Warrenio ·

    I can fit 28% more context after building llama.cpp with OpenBLAS. Huh?

    <!-- SC_OFF --><div class="md"><p>I've noticed a weird difference when building llama.cpp with the Vulkan and OpenBLAS backends vs. building with the Vulkan backend only. It seems like llama.cpp can fit significantly more context in VRAM when built with OpenBLAS than when built w…