I burned 603M tokens in seven days without knowing where they went. Hermes Agent's auxiliary block was silently firing background tasks through kimi-k2.6 (1T pa
An AI developer discovered that their Hermes Agent was consuming a significant number of tokens, totaling 603 million over seven days, due to silently running background tasks. The issue was traced to the kimi-k2.6 model. The developer implemented explicit routing to optimize token usage, assigning different tasks to lighter or more appropriate models like rnj-1:8b, gemma3:12b, deepseek-v4-flash, and kimi-k2.5, resulting in cost reductions of up to 125x. AI
IMPACT Optimizing LLM routing can significantly reduce operational costs and improve efficiency for AI applications.