Developer runs Anthropic Code locally for free using Qwen model

By PulseAugur Editorial · [1 sources] · 2026-05-25 15:02

A developer successfully ran Anthropic's Claude Code locally for four hours, processing 7 million tokens without incurring API costs. This was achieved by routing Claude Code's requests through LiteLLM to a local Qwen3.6-27B-MTP model running on an AMD GPU via llama.cpp. The setup offers benefits such as no rate limits, enhanced privacy, and offline capability, with the developer providing detailed instructions and hardware requirements for replication. AI

IMPACT Enables cost-free, private, and offline use of advanced coding models by leveraging local hardware.

RANK_REASON Demonstration of using a proprietary tool with an open-source backend.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Developer runs Anthropic Code locally for free using Qwen model

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · Kai Bennett · 2026-05-25 15:02

I ran Claude Code on a local LLM for 4 hours — 7M tokens, $0 (would have cost $94)

<p>Last week I ran a 4-hour autonomous coding session using Claude Code — but not against the Anthropic API.</p> <p>Instead, I routed it through a local <a href="https://github.com/ggerganov/llama.cpp" rel="noopener noreferrer">llama.cpp</a> instance running Qwen3.6-27B-MTP on my…

COVERAGE [1]

I ran Claude Code on a local LLM for 4 hours — 7M tokens, $0 (would have cost $94)

RELATED ENTITIES

RELATED TOPICS