PulseAugur
EN
LIVE 10:38:26

Paper defines minimal dimension for exact top-k retrieval

This paper introduces the Minimal Embeddable Dimension (MED) concept, defining the smallest dimension required for embedding objects such that specific subsets are precisely retrieved by score comparison. The research establishes that MED is proportional to k, irrespective of the number of objects (m), for common similarity measures like inner product, Euclidean distance, and cosine similarity. Furthermore, it explores Robust MED (RMED) with a score gap, deriving a feasibility ceiling and demonstrating through simulations and experiments on real datasets that embedding-based retrieval can be effective, challenging the notion that geometric capacity limitations hinder exact retrieval. AI

IMPACT Provides theoretical underpinnings for embedding-based retrieval, potentially guiding future LLM embedding strategies and evaluation.

RANK_REASON Academic paper published on arXiv detailing theoretical and empirical findings on embedding-based retrieval. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Zihao Wang, Hang Yin, Lihui Liu, Hanghang Tong, Yangqiu Song, Ginny Wong, Simon See ·

    $\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval

    arXiv:2601.20844v3 Announce Type: replace-cross Abstract: This paper studies the Minimal Embeddable Dimension (MED): the least dimension in which there exists a configuration of $m$ object vectors so that every subset of size at most $k$ is exactly retrieved by score comparison. …