Researchers investigated the memorization tendencies of fill-in-the-middle (FIM) pretraining objectives in language models. By training comparable Llama 3.2 models with FIM and standard left-to-right (LTR) objectives on a corpus containing repeated text, they observed that FIM models were more likely to recall short or partial spans, while LTR models favored exact continuations. The study found that verbatim recall in FIM-trained models scaled linearly with text repetitions and remained dependent on prefix context, not just suffix context. AI
Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →
IMPACT Understanding memorization dynamics in FIM pretraining can inform the development of more robust and less prone-to-verbatim-recall language models.
RANK_REASON The cluster contains an academic paper detailing research findings on language model pretraining objectives. [lever_c_demoted from research: ic=1 ai=1.0]