A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation
Researchers have developed a new parallel corpus and evaluation protocol specifically for translating between Komi-Yazva and Russian, focusing on endangered and low-resource languages. The dataset includes 457 sentence pairs derived from narrative texts, designed to facilitate leakage-aware evaluation of large language models. Experiments using this setup demonstrated that while LLMs can produce meaningful translations, performance varies significantly by model and prompting strategy, with few-shot prompting showing consistent improvement over zero-shot. AI
IMPACT Provides a new benchmark and dataset for evaluating LLM translation capabilities in extremely low-resource language scenarios.