Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings
Researchers have developed EmbedFilter, a linear transformation technique to improve text embeddings generated by large language models. This method addresses the issue of embeddings being overly influenced by frequent, uninformative tokens, which hinders their semantic capture. By filtering out a subspace encoded by the unembedding matrix, EmbedFilter refines these embeddings, enhancing semantic quality and enabling significant dimensionality reduction for more efficient storage and retrieval. AI
IMPACT Enhances LLM embedding quality and efficiency, potentially improving performance on downstream tasks and reducing storage costs.