Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 1w · [2 sources]

A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

Researchers have developed a new parallel corpus and evaluation protocol specifically for translating between Komi-Yazva and Russian, focusing on endangered and low-resource languages. The dataset includes 457 sentence pairs derived from narrative texts, designed to facilitate leakage-aware evaluation of large language models. Experiments using this setup demonstrated that while LLMs can produce meaningful translations, performance varies significantly by model and prompting strategy, with few-shot prompting showing consistent improvement over zero-shot. AI

IMPACT Provides a new benchmark and dataset for evaluating LLM translation capabilities in extremely low-resource language scenarios.

LLM
Russian
Komi-Yazva