My GLM-5.2-FP8 HGX-H200 SGLang docker deploy config
A user shared their Docker deployment configuration for GLM-5.2-FP8 on an HGX-H200 system using SGLang. The configuration achieved a 262k context window and a throughput of 70 tokens per second. The user noted that certain flags, like DP and moe-a2a-backend, were disabled to optimize performance, and that official vLLM recipes did not work for H200 due to FP8 quantization on the DSV3 architecture. AI
IMPACT Provides insights into optimizing large context windows and throughput for specific hardware configurations.