Researchers have investigated the emergence of cross-lingual generalization in large language models during multilingual pretraining. By analyzing a 1.7B parameter model trained on nine languages with fine-grained checkpoints, they observed that linguistic capabilities and token-level copying develop concurrently. Translation skills emerge in two phases: an initial stage reliant on copying and surface similarities, followed by a second phase where more generalized translation mechanisms are formed while copying is refined. This study offers a detailed perspective on the development of cross-lingual abilities in multilingual models. AI
IMPACT Provides a fine-grained view of how cross-lingual generalization develops during multilingual pretraining, informing future model architectures and training strategies.
RANK_REASON Academic paper detailing a novel dataset and analysis of multilingual pretraining dynamics. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →