Rethinking Cross-lingual Gaps from a Statistical Viewpoint
Researchers have proposed a new statistical viewpoint to understand cross-lingual gaps in large language models (LLMs). Instead of focusing on training failures, this work hypothesizes that the variance of responses in a target language is a key cause of accuracy drops compared to the source language. The study formalizes cross-lingual gaps into biased and unbiased errors and demonstrates that controlling response variance can improve source-target transfer scores by up to 12 absolute points. AI
IMPACT This research offers a new framework for understanding and potentially mitigating cross-lingual limitations in LLMs, which could improve their performance in multilingual applications.