Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 15h · [2 sources]

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

Researchers have introduced SkMTEB, a new benchmark designed to evaluate text embedding models specifically for the Slovak language. This benchmark includes 31 datasets across 7 task types, significantly expanding coverage for this low-resource language. The study found that large multilingual models performed best, while existing Slovak-specific NLU models did not transfer well to embedding tasks. To address this, the team developed two open-source Slovak embedding models, \texttt{e5-sk-small} and \texttt{e5-sk-large}, which offer competitive performance with proprietary APIs while being locally deployable. AI

IMPACT Provides a new evaluation framework and open-source models for Slovak language AI applications, potentially enabling better semantic search and RAG.

MTEB
Slovak
e5-sk-small
e5-sk-large
Multilingual E5
SkMTEB