PulseAugur
实时 06:48:40

Study finds switchless networks more cost-effective for MoE LLM serving

A new paper analyzes network topologies for Mixture-of-Experts (MoE) Large Language Model (LLM) serving, finding that lower-cost, switchless networks can be more cost-effective than expensive scale-up infrastructures. The research indicates that reducing link bandwidth in current scale-up networks could improve cost-effectiveness by up to 27%. The study suggests that switchless topologies, particularly the 3D full-mesh, offer a superior performance-cost tradeoff and this advantage is expected to continue with future GPU generations. AI

影响 Suggests significant cost savings for LLM serving infrastructure by optimizing network topologies.

排序理由 Academic paper analyzing infrastructure for LLM serving.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Study finds switchless networks more cost-effective for MoE LLM serving

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Junsun Choi, Sam Son, Sunjin Choi, Hansung Kim, Yakun Sophia Shao, Scott Shenker, Sylvia Ratnasamy, Borivoje Nikolic ·

    Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

    arXiv:2605.00254v1 Announce Type: cross Abstract: Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM serving runtime. This has prompted industry to invest heavily in expensive …

  2. arXiv cs.AI TIER_1 English(EN) · Borivoje Nikolic ·

    Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

    Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM serving runtime. This has prompted industry to invest heavily in expensive high-bandwidth scale-up networks. We question whet…