New Slovak Text Embedding Benchmark and Models Released

By PulseAugur Editorial · [2 sources] · 2026-06-11 17:50

Researchers have introduced SkMTEB, a new benchmark designed to evaluate text embedding models specifically for the Slovak language. This benchmark includes 31 datasets across 7 task types, significantly expanding coverage for this low-resource language. The study found that large multilingual models performed best, while existing Slovak-specific NLU models did not transfer well to embedding tasks. To address this, the team developed two open-source Slovak embedding models, \texttt{e5-sk-small} and \texttt{e5-sk-large}, which offer competitive performance with proprietary APIs while being locally deployable. AI

IMPACT Provides a new evaluation framework and open-source models for Slovak language AI applications, potentially enabling better semantic search and RAG.

RANK_REASON The cluster describes a new academic paper introducing a benchmark and models for a specific language.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New Slovak Text Embedding Benchmark and Models Released

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Marek \v{S}uppa, Andrej Ridzik, Daniel Hl\'adek, Nat\'alia K\v{n}a\v{z}ekov\'a, Vikt\'oria Ondrejov\'a · 2026-06-12 04:00

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

arXiv:2606.13647v1 Announce Type: cross Abstract: We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual be…
arXiv cs.AI TIER_1 English(EN) · Viktória Ondrejová · 2026-06-11 17:50

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual benchmark coverage for Slovak. Our evaluation of 31 …

COVERAGE [2]

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

RELATED ENTITIES

RELATED TOPICS