Brief · PulseAugur

SIGNIFICANT · 量子位 (QbitAI) 中文(ZH) · 4d · [10 sources]

Actual test of Xiaomi's fastest 1T large model: throughput over 1000 Tokens per second, Vibe Coding delivered in seven seconds

Xiaomi's MiMo team has released MiMo-V2.5-Pro-UltraSpeed, a new inference mode for their 1-trillion-parameter model that achieves over 1000 tokens per second on commodity GPUs. This significant speedup is attributed to a combination of FP4 quantization, DFlash speculative decoding, and the TileRT serving system, without requiring custom hardware. The company claims this advancement will revolutionize AI applications by enabling faster parallel reasoning, improving coding agent efficiency, and supporting real-time decision-making processes. AI

IMPACT Accelerates real-time AI applications and agentic workflows by drastically reducing inference latency on widely available hardware.

MiMo-V2.5-Pro-UltraSpeed
Xiaomi
MiMo
TileRT
ChatGPT
DFlash speculative decoding
1-trillion-parameter model
FP4 quantization
commodity GPUs