235B Qwen3 model runs on 48GB MacBook via custom C++ engine

By PulseAugur Editorial · [1 sources] · 2026-07-02 21:44

A developer has successfully run the 235-billion-parameter Qwen3-235B-A22B-Instruct-2507 model on a consumer MacBook with 48 GB of RAM. This was achieved by using a custom C++ engine and Metal kernels, streaming the model's experts from the Solid State Drive. The process was slow and imperfect, but demonstrated that large frontier models can operate on consumer hardware, challenging the assumption that they require massive GPU clusters. A key debugging challenge involved a mismatch in the chat template, which was resolved by loading the correct tokenizer. AI

IMPACT Proves that large frontier models can be run on consumer hardware, potentially democratizing access and use.

RANK_REASON Demonstrates running a large frontier model on consumer hardware, which is a research-level achievement. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

235B Qwen3 model runs on 48GB MacBook via custom C++ engine

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · Luca Visciola · 2026-07-02 21:44

"Hello, World!" — A 235-Billion-Parameter Frontier Model Just Spoke on a 48 GB MacBook

<blockquote> <p><em>This is the second entry in a curious builder's diary. In <a href="https://www.linkedin.com/pulse/rock-paper-silicon-how-web-developer-used-satellite-hack-visciola-w7prf/" rel="noopener noreferrer">the first one</a>, a self-taught web developer borrowed a sate…

COVERAGE [1]

"Hello, World!" — A 235-Billion-Parameter Frontier Model Just Spoke on a 48 GB MacBook

RELATED ENTITIES

RELATED TOPICS