The Mimo 2.5 Pro large language model has been benchmarked on an 8x Nvidia GB10 cluster, achieving impressive throughput speeds. Under single-user conditions, it reached 40 tokens/second with a 1k context, scaling up to 17 tokens/second with a 250k context. With parallel processing, the model demonstrated even higher performance, hitting 83 tokens/second with four parallel requests. AI
IMPACT Demonstrates high throughput for large context windows on specialized hardware, potentially influencing local LLM deployment strategies.
RANK_REASON Benchmark results for a specific model on custom hardware. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →