PulseAugur
EN
LIVE 07:51:10

Kimi-K2.6 performance on 8x B200 GPUs queried

A user on Reddit is seeking performance estimates for running the Kimi-K2.6 model on an 8x NVIDIA B200 GPU setup. They are specifically interested in throughput figures for long input and output sequences with a concurrency of 32 users. The user also inquired about potential performance differences between using vLLM or SGLang, and how a distributed setup across InfiniBand would compare to a single NVLink node. AI

IMPACT Provides insight into potential real-world performance of large models on advanced hardware, informing deployment strategies.

RANK_REASON User query about expected performance for a specific model and hardware configuration.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Acceptable-State-271 ·

    Kimi K2.6 on 8×B200: expected vLLM/SGLang throughput?

    <!-- SC_OFF --><div class="md"><p>I’m planning to run <strong>moonshotai/Kimi-K2.6</strong> on <strong>8×NVIDIA B200</strong> with <strong>vLLM or SGLang</strong>, likely using <strong>NVFP4(or original QAT model)</strong>.</p> <p>What real throughput should I expect for:</p> <ul…