ENTITY MTEB

MTEB

PulseAugur coverage of MTEB — every cluster mentioning MTEB across labs, papers, and developer communities, ranked by signal.

Total · 30d

15

15 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

12

12 over 90d

TIER MIX · 90D

significant 1
research 7
tool 6
commentary 1

TOPICS

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 15 TOTAL

COMMENTARY · CL_103119 · Jun 22 · 00:23

AI agents fail due to flawed search index distribution, not prompting

A common issue in AI agents is that their search results appear correct but lead to factually wrong answers due to problems with the underlying search index. This is not a prompting issue but a distribution problem, whe…
RESEARCH · CL_105011 · Jun 22 · 00:00

HAKARI-Bench offers lightweight evaluation for retrieval models · 2 sources tracked

Researchers have introduced HAKARI-Bench, a lightweight benchmark designed to streamline the evaluation of retrieval architectures and efficiency settings for retrieval-augmented generation and semantic search. This new…
RESEARCH · CL_86605 · Jun 11 · 17:50

New Slovak Text Embedding Benchmark and Models Released

Researchers have introduced SkMTEB, a new benchmark designed to evaluate text embedding models specifically for the Slovak language. This benchmark includes 31 datasets across 7 task types, significantly expanding cover…
RESEARCH · CL_72548 · Jun 4 · 08:30

New method enhances LLM text embeddings using text reversal

Researchers have introduced ReverseEOL, a novel method to enhance text embeddings generated by decoder-only Large Language Models (LLMs) without additional training. This technique augments standard embeddings by incorp…
TOOL · CL_65810 · Jun 2 · 04:00

New research explores extreme text embedding compression

Researchers have investigated the combined impact of dimensionality reduction and quantization on compressing text embeddings. Their experiments, using four MTEB task families and four pretrained embedding models, show …
TOOL · CL_56190 · May 28 · 04:00

PromptEmbedder offers efficient, transferable text embeddings via dual-LLM prompting

Researchers have introduced PromptEmbedder, a new dual-LLM framework designed to improve the efficiency and transferability of text embeddings. This method decouples embedding knowledge from specific model weights by us…
RESEARCH · CL_56316 · May 27 · 09:11

New benchmarks and studies probe multilingual text embedding robustness

Researchers are exploring the robustness of multilingual text embeddings across various tasks and languages. One study introduces new indicators to assess how dataset composition and ranking methods affect model perform…
RESEARCH · CL_53958 · May 26 · 00:00

Google DeepMind unveils Gemini Embedding 2 multimodal model

Google DeepMind has introduced Gemini Embedding 2, a new native multimodal embedding model. This model can generate unified representations for video, audio, image, and text data, demonstrating strong zero-shot capabili…
RESEARCH · CL_43997 · May 21 · 09:05

Embedding models' structure predicts benchmark performance, study finds

Researchers have demonstrated that the organization of embedding spaces within high-performing models consistently predicts their benchmark performance. By evaluating 25 embedding models across five MTEB tasks, they fou…
TOOL · CL_39077 · May 19 · 00:00

Hugging Face releases Ettin Reranker models for improved search

Hugging Face has released a new family of six Ettin Reranker models, built on top of Ettin ModernBERT encoders. These models offer state-of-the-art performance for their respective sizes and are designed for the retriev…
TOOL · CL_22216 · May 8 · 04:00

LMEB benchmark evaluates long-horizon memory retrieval beyond traditional passage retrieval

Researchers have introduced the Long-horizon Memory Embedding Benchmark (LMEB), a new evaluation framework designed to assess the capabilities of embedding models in handling complex, long-horizon memory retrieval tasks…
TOOL · CL_15953 · May 5 · 04:00

Causal2Vec enhances decoder-only LLMs for embeddings without architecture changes

Researchers have introduced Causal2Vec, a novel method to enhance decoder-only large language models (LLMs) for embedding tasks without altering their core architecture. This approach involves pre-encoding input text in…
TOOL · CL_15862 · May 5 · 04:00

EPIC training method boosts LLM text encoder performance on MTEB benchmark

Researchers have developed a new training strategy called EPIC (Embedding-based In-Context Prompt Training) to improve the quality of text embeddings generated by large language models. This method reduces computational…
RESEARCH · CL_01537 · Oct 19 · 00:00

Hugging Face launches MTEB benchmark for Polish text embeddings

Researchers have introduced the Polish Massive Text Embedding Benchmark (PL-MTEB), a new evaluation suite designed to assess text embedding models specifically for the Polish language. This benchmark includes 30 diverse…
SIGNIFICANT · CL_01566 · Jan 24 · 08:00

OpenAI launches new embedding models with price cuts and performance boosts

OpenAI has released new embedding models, text-embedding-3-small and text-embedding-3-large, offering significant improvements in performance and efficiency over previous models like text-embedding-ada-002. These new mo…