Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax
Luce Spark is a new open-source system that allows large Mixture-of-Experts (MoE) language models, specifically 33-35 billion parameters, to run on a single 16GB GPU. It achieves this by intelligently keeping only the currently active experts on the GPU, while the rest are stored in system RAM and swapped in as needed. This method avoids the performance penalty typically associated with offloading, enabling models that would otherwise not fit to run efficiently. AI
IMPACT Enables running large MoE models on consumer-grade hardware, democratizing access to advanced AI capabilities.