Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CL English(EN) · 1d

Pooling and Semantic Shift: The Fundamental Challenges in Long Text Embedding and Retrieval

A new paper published on arXiv identifies pooling operations and semantic shift as the primary drivers of embedding collapse in long text, rather than text length or attention mechanisms alone. The research establishes a theoretical framework demonstrating how contextual pooling inherently causes semantic dilution and spatial concentration of vectors. Experiments show that semantic shift is the main predictor of embedding concentration, and anisotropy is only detrimental when caused by significant semantic shifts, offering a new explanation for challenges in long-context retrieval. AI

IMPACT Provides a theoretical framework and experimental evidence to address fundamental challenges in long text embedding, potentially improving retrieval systems.
- arXiv
- Hang Gao
RESEARCH · arXiv cs.LG English(EN) · 4d · [2 sources]

LEMUR: Learned Multi-Vector Retrieval

Researchers have introduced two new methods to improve the efficiency and effectiveness of dense vector retrieval, a core component in modern machine learning systems. The first, VRSD, addresses the challenge of balancing similarity and diversity in search results by proposing a novel optimization problem and a parameter-free heuristic, demonstrating superior performance over existing baselines. The second, LEMUR, tackles the latency issue in multi-vector retrieval by formulating it as a supervised learning problem and reducing inference to single-vector search, achieving significant speedups. AI

IMPACT These advancements in vector retrieval could lead to more efficient and accurate semantic search and retrieval-augmented generation systems.

Brief

Pooling and Semantic Shift: The Fundamental Challenges in Long Text Embedding and Retrieval

LEMUR: Learned Multi-Vector Retrieval