Xiaomi claims 1,000+ TPS on 1T parameter MoE model with 8 GPUs

By PulseAugur Editorial · [1 sources] · 2026-06-08 15:51

Xiaomi has announced a new large language model, MiMo-V2.5-Pro UltraSpeed, which they claim can process over 1,000 tokens per second. This performance was reportedly achieved on a 1 trillion parameter Mixture-of-Experts (MoE) model using a standard 8-GPU server. The company highlights this achievement as a significant advancement, contrasting it with specialized hardware solutions from competitors. AI

IMPACT This claimed performance could significantly lower the cost and increase the accessibility of running very large models, potentially accelerating adoption.

RANK_REASON The cluster reports on a claimed performance benchmark for a new model, which is a research milestone. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/No-Selection2972 · 2026-06-08 15:51

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

<div class="md">Just saw Xiaomi MiMo announce MiMo-V2.5-Pro UltraSpeed, claiming they broke the 1,000 tokens/sec output barrier on a 1 trillion parameter MoE model. According to them, they’re doing it on a single standar…

COVERAGE [1]

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

RELATED ENTITIES

RELATED TOPICS