SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation
Researchers have introduced SkMTEB, a new benchmark designed to evaluate text embedding models specifically for the Slovak language. This benchmark includes 31 datasets across 7 task types, significantly expanding coverage for this low-resource language. The study found that large multilingual models performed best, while existing Slovak-specific NLU models did not transfer well to embedding tasks. To address this, the team developed two open-source Slovak embedding models, \texttt{e5-sk-small} and \texttt{e5-sk-large}, which offer competitive performance with proprietary APIs while being locally deployable. AI
IMPACT Provides a new evaluation framework and open-source models for Slovak language AI applications, potentially enabling better semantic search and RAG.