Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan
Researchers have developed a novel data synthesis method to create neural machine translation (NMT) models for low-resource Indigenous languages, specifically Q'eqchi' Mayan. By transforming dictionaries into a synthetic corpus and using Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters on an mT5-base model, they achieved strong structural acquisition. However, the resulting model showed a significant gap in lexical grounding compared to organic language, indicating that while synthetic data is effective for learning grammar, authentic data is crucial for semantic refinement. AI
IMPACT Demonstrates a viable method for creating translation models for endangered languages, preserving linguistic data sovereignty.