I can fit 28% more context after building llama.cpp with OpenBLAS. Huh?
A user on Reddit's r/LocalLLaMA subreddit has discovered that compiling the llama.cpp software with OpenBLAS support, in addition to Vulkan, allows for a significant increase in context window size. When using the Qwen 3.6 27B model, the context window expanded from approximately 87,808 tokens to 112,896 tokens. The user is investigating whether this is expected behavior, a bug, or an anomaly. AI
IMPACT Potential for increased context window efficiency in local LLM deployments.