Brief · PulseAugur

TOOL · Mastodon — fosstodon.org English(EN) · 1h

I burned 603M tokens in seven days without knowing where they went. Hermes Agent's auxiliary block was silently firing background tasks through kimi-k2.6 (1T pa

An AI developer discovered that their Hermes Agent was consuming a significant number of tokens, totaling 603 million over seven days, due to silently running background tasks. The issue was traced to the kimi-k2.6 model. The developer implemented explicit routing to optimize token usage, assigning different tasks to lighter or more appropriate models like rnj-1:8b, gemma3:12b, deepseek-v4-flash, and kimi-k2.5, resulting in cost reductions of up to 125x. AI

IMPACT Optimizing LLM routing can significantly reduce operational costs and improve efficiency for AI applications.

kimi-k2.6
deepseek-v4-flash
kimi-k2.5
Hermes Agent
gemma3:12b
rnj-1:8b