Luce Spark enables 35B MoE models on 16GB GPUs

By PulseAugur Editorial · [1 sources] · 2026-06-08 15:24

Luce Spark is a new open-source system that allows large Mixture-of-Experts (MoE) language models, specifically 33-35 billion parameters, to run on a single 16GB GPU. It achieves this by intelligently keeping only the currently active experts on the GPU, while the rest are stored in system RAM and swapped in as needed. This method avoids the performance penalty typically associated with offloading, enabling models that would otherwise not fit to run efficiently. AI

IMPACT Enables running large MoE models on consumer-grade hardware, democratizing access to advanced AI capabilities.

RANK_REASON The cluster describes a novel open-source method for running large MoE models on limited hardware, which is a significant research contribution in efficient AI deployment. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Luce Spark enables 35B MoE models on 16GB GPUs

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/sandropuppo · 2026-06-08 15:24

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1u0b3cu/luce_spark_a_35b_moe_on_a_16_gb_gpu_without_the/"> <img alt="Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax" src="https://preview.redd.it/tg6kpi4vs26h1.png?width=640&crop=smart&a…

COVERAGE [1]

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

RELATED ENTITIES

RELATED TOPICS