PulseAugur
实时 10:19:04

MaxSketch algorithm improves distinct counting in noisy data streams

Researchers have developed MaxSketch, a novel algorithm for robustly estimating the number of distinct elements in data streams, particularly when dealing with high-dimensional and noisy data. Unlike traditional methods that fail with approximate similarities, MaxSketch utilizes random Gaussian projections to achieve significantly improved memory efficiency. This new approach is particularly effective for learned representations and has demonstrated accuracy in experiments with image streams, bridging the gap between classical streaming algorithms and modern representation learning. AI

影响 Introduces a more memory-efficient method for distinct counting in noisy, high-dimensional data streams, relevant for large-scale machine learning applications.

排序理由 Academic paper introducing a new algorithm for data stream processing.

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

MaxSketch algorithm improves distinct counting in noisy data streams

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Nikos Tsikouras, Constantine Caramanis, Christos Tzamos ·

    MaxSketch: Robust Distinct Counting in Streams via Random Projections

    arXiv:2605.15571v1 Announce Type: new Abstract: Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object …

  2. arXiv stat.ML TIER_1 English(EN) · Christos Tzamos ·

    MaxSketch: Robust Distinct Counting in Streams via Random Projections

    Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only approximately similar -- for example, d…