Xiaomi's MiMo large model team has detailed the technical advancements behind their MiMo-V2.5 series API price reduction. Key breakthroughs include dual-pool KVCache with SWA-aware prefix trees, GCache distributed caching, KVCache-aware scheduling, MTP acceleration during decoding, and multimodal inference optimization. Despite the price cut, the models remain profitable, supported by initiatives like the "Trillion Token Creator Incentive Plan" which has distributed over 100 trillion free tokens. AI
IMPACT Details on model optimization and cost reduction strategies can inform other AI developers on efficient deployment.
RANK_REASON This is a technical deep-dive into an existing model's optimization, not a new model release or significant benchmark.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →