Running Claude Code Offline on an M3 Pro with Qwen3.6
A technical guide details how to run a large language model, Qwen3.6, locally on an Apple M3 Pro laptop for air-gapped environments. The setup involves using Ollama with specific configurations and the MLX runner to enable the 35 billion parameter model, which utilizes a mixture-of-experts architecture to reduce active parameters per token. After applying four crucial fixes, the system successfully processed a Kubernetes incident, generating a pull request without any data leaving the machine, demonstrating that hardware, rather than approach, dictates speed in such local deployments. AI
IMPACT Enables air-gapped AI operations for sensitive environments, demonstrating local LLM deployment feasibility.