I built an open-source proxy that compresses Claude Code's full-price tokens by ~68%, without ever busting the prompt cache
An open-source proxy called llmtrim has been developed to reduce token costs associated with Claude Code. This tool compresses both requests and replies, aiming to preserve the prompt cache discount while decreasing the overall token usage. Initial measurements show significant reductions in token counts for tool outputs and model replies, with minimal latency impact. AI
IMPACT This tool could significantly lower operational costs for users heavily relying on Claude Code, potentially increasing its adoption for cost-sensitive applications.