Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation
Researchers have developed a new method for detecting errors in machine translation that does not require human annotation. This approach, called Iterative MBR Distillation, uses a large language model to generate its own training data, effectively creating pseudo-labels. Experiments show that models trained with this self-generated data perform better than those trained on human-annotated datasets, particularly at identifying specific error spans. AI
IMPACT This method could significantly reduce the cost and improve the consistency of training machine translation evaluation models.