A user on Reddit's r/LocalLLaMA subreddit has discovered that compiling the llama.cpp software with OpenBLAS support, in addition to Vulkan, allows for a significant increase in context window size. When using the Qwen 3.6 27B model, the context window expanded from approximately 87,808 tokens to 112,896 tokens. The user is investigating whether this is expected behavior, a bug, or an anomaly. AI
IMPACT Potential for increased context window efficiency in local LLM deployments.
RANK_REASON User-discovered optimization for open-source LLM inference software. [lever_c_demoted from research: ic=1 ai=0.7]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →