PulseAugur
实时 19:53:32

MaxSketch 算法改进了噪声数据流中的不重复计数

研究人员开发了 MaxSketch,这是一种新颖的算法,用于在数据流中稳健地估计不重复元素的数量,尤其是在处理高维和噪声数据时。与在近似相似性方面失败的传统方法不同,MaxSketch 利用随机高斯投影来实现显著提高的内存效率。这种新方法对于学习表示特别有效,并在图像流实验中显示出准确性,弥合了经典流算法与现代表示学习之间的差距。 AI

影响 为噪声、高维数据流中的不重复计数引入了一种更节省内存的方法,这对于大规模机器学习应用具有相关性。

排序理由 介绍数据流处理新算法的学术论文。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

MaxSketch 算法改进了噪声数据流中的不重复计数

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Nikos Tsikouras, Constantine Caramanis, Christos Tzamos ·

    MaxSketch: Robust Distinct Counting in Streams via Random Projections

    arXiv:2605.15571v1 Announce Type: new Abstract: Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object …

  2. arXiv stat.ML TIER_1 English(EN) · Christos Tzamos ·

    MaxSketch: Robust Distinct Counting in Streams via Random Projections

    Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only approximately similar -- for example, d…