English(EN) Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

Flash-GMM 内核将 GMM 聚类速度提升 20 倍，支持更大规模数据集

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-09 14:07

研究人员开发了 Flash-GMM，这是一种专为 GPU 上高斯混合模型 (GMM) 高效计算而设计的新型融合 Triton 内核。该内核通过避免完全物化责任矩阵来显著降低内存需求，从而实现了 20 倍的速度提升，并使得在单个设备上处理比以往大 100 倍的数据集成为可能。Flash-GMM 已集成到近似最近邻搜索中，为 k-means 聚类提供了一种可行的替代方案，并提高了召回率。 AI

影响加速大规模数据的 GMM 聚类，可能提高 ANN 搜索等应用的性能。

排序理由该集群包含一篇详细介绍 GMM 聚类新内核的学术论文。

在 arXiv cs.IR (Information Retrieval) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Gal Bloch, Ariel Gera, Matan Orbach, Ohad Eytan, Assaf Toledo · 2026-06-10 04:00

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

arXiv:2606.10896v1 Announce Type: new Abstract: We present \textbf{Flash-GMM}, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single GPU pass. By eliminating the need to materialize the full responsibility matrix in GP…
arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Assaf Toledo · 2026-06-09 14:07

Flash-GMM：一种内存高效的内核，用于可扩展的软聚类

We present \textbf{Flash-GMM}, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single GPU pass. By eliminating the need to materialize the full responsibility matrix in GPU memory, Flash-GMM achieves a \textbf{20$\times…

报道来源 [2]

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

Flash-GMM：一种内存高效的内核，用于可扩展的软聚类

相关实体

相关话题