Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 3h

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

Luce Spark is a new open-source system that allows large Mixture-of-Experts (MoE) language models, specifically 33-35 billion parameters, to run on a single 16GB GPU. It achieves this by intelligently keeping only the currently active experts on the GPU, while the rest are stored in system RAM and swapped in as needed. This method avoids the performance penalty typically associated with offloading, enabling models that would otherwise not fit to run efficiently. AI

IMPACT Enables running large MoE models on consumer-grade hardware, democratizing access to advanced AI capabilities.

RTX 3090
Qwen3.6 35B-A3B
Luce Spark
16 GB GPU
Luce-Org/lucebox-hub
Laguna XS.2 33B-A3B