This paper introduces the Minimal Embeddable Dimension (MED) concept, defining the smallest dimension required for embedding objects such that specific subsets are precisely retrieved by score comparison. The research establishes that MED is proportional to k, irrespective of the number of objects (m), for common similarity measures like inner product, Euclidean distance, and cosine similarity. Furthermore, it explores Robust MED (RMED) with a score gap, deriving a feasibility ceiling and demonstrating through simulations and experiments on real datasets that embedding-based retrieval can be effective, challenging the notion that geometric capacity limitations hinder exact retrieval. AI
IMPACT Provides theoretical underpinnings for embedding-based retrieval, potentially guiding future LLM embedding strategies and evaluation.
RANK_REASON Academic paper published on arXiv detailing theoretical and empirical findings on embedding-based retrieval. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →