Kimi-K2.6 performance on 8x B200 GPUs queried

By PulseAugur Editorial · [1 sources] · 2026-06-05 03:48

A user on Reddit is seeking performance estimates for running the Kimi-K2.6 model on an 8x NVIDIA B200 GPU setup. They are specifically interested in throughput figures for long input and output sequences with a concurrency of 32 users. The user also inquired about potential performance differences between using vLLM or SGLang, and how a distributed setup across InfiniBand would compare to a single NVLink node. AI

IMPACT Provides insight into potential real-world performance of large models on advanced hardware, informing deployment strategies.

RANK_REASON User query about expected performance for a specific model and hardware configuration.

Read on r/LocalLLaMA →

infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Acceptable-State-271 · 2026-06-05 03:48

Kimi K2.6 on 8×B200: expected vLLM/SGLang throughput?

<div class="md">I’m planning to run moonshotai/Kimi-K2.6 on 8×NVIDIA B200 with vLLM or SGLang, likely using NVFP4(or original QAT model). What real throughput should I expect for: <ul…

COVERAGE [1]

Kimi K2.6 on 8×B200: expected vLLM/SGLang throughput?

RELATED ENTITIES

RELATED TOPICS