Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server
Xiaomi has announced a new large language model, MiMo-V2.5-Pro UltraSpeed, which they claim can process over 1,000 tokens per second. This performance was reportedly achieved on a 1 trillion parameter Mixture-of-Experts (MoE) model using a standard 8-GPU server. The company highlights this achievement as a significant advancement, contrasting it with specialized hardware solutions from competitors. AI
IMPACT This claimed performance could significantly lower the cost and increase the accessibility of running very large models, potentially accelerating adoption.