Mimo 2.5 Pro hits 83 t/s on Nvidia GB10 cluster

By PulseAugur Editorial · [1 sources] · 2026-05-28 20:18

The Mimo 2.5 Pro large language model has been benchmarked on an 8x Nvidia GB10 cluster, achieving impressive throughput speeds. Under single-user conditions, it reached 40 tokens/second with a 1k context, scaling up to 17 tokens/second with a 250k context. With parallel processing, the model demonstrated even higher performance, hitting 83 tokens/second with four parallel requests. AI

IMPACT Demonstrates high throughput for large context windows on specialized hardware, potentially influencing local LLM deployment strategies.

RANK_REASON Benchmark results for a specific model on custom hardware. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Mimo 2.5 Pro hits 83 t/s on Nvidia GB10 cluster

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/ciprianveg · 2026-05-28 20:18

Mimo 2.5 Pro - 40t/s on 8x Nvidia Spark/GB10 cluster

<div class="md">I got Mimo 2.5 Pro running on my 8x Asus Nvidia GB10 cluster using mtp-2, single user request, coding: 40 t/s - 1k context, 32t/s - 30k context, 25t/s - 125k context, 17t/s - 250k context. 2 parallel reached 60t/s a…

COVERAGE [1]

Mimo 2.5 Pro - 40t/s on 8x Nvidia Spark/GB10 cluster

RELATED ENTITIES

RELATED TOPICS