Brief

last 24h

[2/2] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · r/LocalLLaMA English(EN) · 4h

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

Xiaomi has announced a new large language model, MiMo-V2.5-Pro UltraSpeed, which they claim can process over 1,000 tokens per second. This performance was reportedly achieved on a 1 trillion parameter Mixture-of-Experts (MoE) model using a standard 8-GPU server. The company highlights this achievement as a significant advancement, contrasting it with specialized hardware solutions from competitors. AI

IMPACT This claimed performance could significantly lower the cost and increase the accessibility of running very large models, potentially accelerating adoption.
- Xiaomi
- MiMo-V2.5-Pro UltraSpeed
RESEARCH · Mastodon — mastodon.social English(EN) · 4h · [2 sources]

🚀 Xiaomi's MiMo-v2.5-Pro-UltraSpeed model is here to redefine "fast" with a staggering 1 trillion parameters and a blazing 1000 TPS, because who doesn't need th

Xiaomi's MiMo team has developed a 1-trillion-parameter model capable of processing over 1000 tokens per second on commodity GPUs. This significant speed increase was achieved through a combination of advanced techniques, including FP4 quantization, DFlash speculative decoding, and the TileRT serving system. The breakthrough demonstrates a notable advancement in efficient large model deployment. AI

IMPACT Demonstrates significant progress in making extremely large models more efficient and accessible on standard hardware.