A new paper analyzes network topologies for Mixture-of-Experts (MoE) Large Language Model (LLM) serving, finding that lower-cost, switchless networks can be more cost-effective than expensive scale-up infrastructures. The research indicates that reducing link bandwidth in current scale-up networks could improve cost-effectiveness by up to 27%. The study suggests that switchless topologies, particularly the 3D full-mesh, offer a superior performance-cost tradeoff and this advantage is expected to continue with future GPU generations. AI
影响 Suggests significant cost savings for LLM serving infrastructure by optimizing network topologies.
排序理由 Academic paper analyzing infrastructure for LLM serving.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →