A user on r/LocalLLaMA is experiencing a significant drop in performance with the Qwen3.5 122B A10B model when its context window exceeds approximately 75-80k tokens. The model begins to hallucinate, forget information, and misattribute statements. The user suspects this degradation might be due to using a Q3 quantization level, as their system cannot handle Q4 without disk swapping, and is seeking advice on whether this is a model-specific issue, a quantization limitation, or if specific llama.cpp settings could mitigate the problem, noting they are already using a BF16 KV cache. AI
IMPACT Highlights potential limitations of quantized models in handling extended contexts, impacting usability for complex tasks.
RANK_REASON User-generated discussion about model performance limitations.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →