New DPO method boosts NMT model performance with preference-based post-training

By PulseAugur Editorial · [2 sources] · 2026-04-28 14:29

Researchers have developed a new post-training method for neural machine translation (NMT) systems that utilizes reinforcement learning and Direct Preference Optimization (DPO). This framework requires only a general text corpus and feedback from an expert translator, which can be human or AI. Experiments on English-to-German translation showed that applying this DPO-driven approach to the gemma3-1b model significantly improved its translation quality, increasing the COMET score from 0.703 to 0.747. AI

IMPACT Enhances NMT models using preference-based post-training, potentially improving translation accuracy for various language pairs.

RANK_REASON This is a research paper detailing a new method for improving NMT models.

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Mehrdad Ghassabi, Spehr Rajabi, Hamidreza Baradaran Kashani, Sadra Hakim, Mahshid Keivandarian · 2026-04-29 04:00

Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation

arXiv:2604.25702v1 Announce Type: new Abstract: Contemporary neural machine translation (NMT) systems are almost exclusively built by training on supervised parallel data. Despite the tremendous progress achieved, these systems still exhibit persistent translation errors. This pa…
arXiv cs.CL TIER_1 English(EN) · Mahshid Keivandarian · 2026-04-28 14:29

Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation

Contemporary neural machine translation (NMT) systems are almost exclusively built by training on supervised parallel data. Despite the tremendous progress achieved, these systems still exhibit persistent translation errors. This paper proposes that a post-training paradigm based…