PulseAugur
EN
LIVE 15:53:24

New corpus aids LLM translation for endangered Komi-Yazva language

Researchers have developed a new parallel corpus and evaluation protocol specifically for translating between Komi-Yazva and Russian, focusing on endangered and low-resource languages. The dataset includes 457 sentence pairs derived from narrative texts, designed to facilitate leakage-aware evaluation of large language models. Experiments using this setup demonstrated that while LLMs can produce meaningful translations, performance varies significantly by model and prompting strategy, with few-shot prompting showing consistent improvement over zero-shot. AI

IMPACT Provides a new benchmark and dataset for evaluating LLM translation capabilities in extremely low-resource language scenarios.

RANK_REASON The cluster contains an academic paper detailing a new dataset and evaluation protocol for LLM translation.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Petr Parshakov ·

    A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

    arXiv:2606.06420v1 Announce Type: new Abstract: We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs fr…

  2. arXiv cs.CL TIER_1 English(EN) · Petr Parshakov ·

    A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

    We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs from 74 narrative texts and is accompanied by docu…