Get you some GPUs, it's not worth the hacks around lack of RAM
A user on r/LocalLLaMA advises that acquiring sufficient GPU VRAM is more practical than employing workarounds for limited memory. They suggest that even older cards like P40s or MI50s are viable if they allow models to fit entirely into memory. The user details running the Qwen3.6-27B model with a Q8 quantization, f16 K/V cache, and a 128k context length across two RTX 3090 GPUs. AI
IMPACT Suggests prioritizing hardware VRAM over complex software optimizations for running large language models locally.