PulseAugur
LIVE 12:26:34
research · [2 sources] ·
0
research

Paper explains why mean pooling is effective for text embeddings

This paper investigates the effectiveness of mean pooling in text embedding generation, a common technique that averages token embeddings. Researchers developed a metric to quantify information loss, specifically concerning second-order statistics, which can occur when distinct embedding distributions are mapped to similar text embeddings. Their findings indicate that modern text encoders, particularly those fine-tuned with contrastive learning, demonstrate robustness against this collapse, with this robustness correlating positively with downstream task performance. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Provides a new metric for evaluating text encoders, potentially guiding future model development and fine-tuning strategies.

RANK_REASON Academic paper published on arXiv detailing a new metric for text embeddings.

Read on arXiv cs.CL →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 · Tomomasa Hara, Hiroto Kurita, Masaaki Imaizumi, Kentaro Inui, Sho Yokoi ·

    Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

    arXiv:2604.27398v1 Announce Type: new Abstract: For constructing text embeddings, mean pooling, which averages token embeddings, is the standard approach. This paper examines whether mean pooling actually works well in real models. First, we note that mean pooling can collapse in…

  2. arXiv cs.CL TIER_1 · Sho Yokoi ·

    Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

    For constructing text embeddings, mean pooling, which averages token embeddings, is the standard approach. This paper examines whether mean pooling actually works well in real models. First, we note that mean pooling can collapse information beyond the first-order statistics of t…