PulseAugur
实时 10:00:05
English(EN) ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

ASTRA-sim 3.0 通过 GPU 和基础设施建模增强 ML 模拟

研究人员发布了 ASTRA-sim 3.0,这是一个用于分布式机器学习的更新的开源模拟器。新版本通过对 GPU 执行和基础设施进行细粒度、缓存线级别的建模,提高了模拟保真度。它还引入了 InfraGraph,一种用于网络基础设施的标准表示,能够对集合算法和硬件架构进行更详细的设计空间探索。 AI

影响 能够更准确地模拟分布式机器学习工作负载,从而可能加速高效的 AI 基础设施和算法的设计。

排序理由 这是一篇研究论文,详细介绍了用于分布式机器学习的更新模拟工具。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · William Won, Jinsun Yoo, Tuan Ta, Moumita Dey, Andy Balogh, Pradosh Datta, Furkan Eris, Conor Green, Winston Liu, Changhai Man, Kingshuk Mandal, Amos Rai, Vinay Ramakrishnaiah, Ruchi Shah, David Sidler, Harsh Sikhwal, Hanjiang Wu, Tushar Krishna, Bradfor… ·

    ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

    arXiv:2606.10440v1 Announce Type: cross Abstract: Distributed machine learning (ML) is a key paradigm for today's large-scale artificial intelligence applications. As model inference arises as an important use case, faithful modeling of latency-sensitive collective communication …