New corpus aids LLM translation for endangered Komi-Yazva language

By PulseAugur Editorial · [2 sources] · 2026-06-04 17:26

Researchers have developed a new parallel corpus and evaluation protocol specifically for translating between Komi-Yazva and Russian, focusing on endangered and low-resource languages. The dataset includes 457 sentence pairs derived from narrative texts, designed to facilitate leakage-aware evaluation of large language models. Experiments using this setup demonstrated that while LLMs can produce meaningful translations, performance varies significantly by model and prompting strategy, with few-shot prompting showing consistent improvement over zero-shot. AI

IMPACT Provides a new benchmark and dataset for evaluating LLM translation capabilities in extremely low-resource language scenarios.

RANK_REASON The cluster contains an academic paper detailing a new dataset and evaluation protocol for LLM translation.

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Petr Parshakov · 2026-06-05 04:00

A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

arXiv:2606.06420v1 Announce Type: new Abstract: We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs fr…
arXiv cs.CL TIER_1 English(EN) · Petr Parshakov · 2026-06-04 17:26

A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs from 74 narrative texts and is accompanied by docu…

COVERAGE [2]

A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

RELATED ENTITIES

RELATED TOPICS