Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 15h · [2 sources]

Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan

Researchers have developed a novel data synthesis method to create neural machine translation (NMT) models for low-resource Indigenous languages, specifically Q'eqchi' Mayan. By transforming dictionaries into a synthetic corpus and using Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters on an mT5-base model, they achieved strong structural acquisition. However, the resulting model showed a significant gap in lexical grounding compared to organic language, indicating that while synthetic data is effective for learning grammar, authentic data is crucial for semantic refinement. AI

IMPACT Demonstrates a viable method for creating translation models for endangered languages, preserving linguistic data sovereignty.

LoRA
Alexander Chulzhanov
Q'eqchi' Mayan
mT5-base