Xiaomi has announced a new large language model, MiMo-V2.5-Pro UltraSpeed, which they claim can process over 1,000 tokens per second. This performance was reportedly achieved on a 1 trillion parameter Mixture-of-Experts (MoE) model using a standard 8-GPU server. The company highlights this achievement as a significant advancement, contrasting it with specialized hardware solutions from competitors. AI
IMPACT This claimed performance could significantly lower the cost and increase the accessibility of running very large models, potentially accelerating adoption.
RANK_REASON The cluster reports on a claimed performance benchmark for a new model, which is a research milestone. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →