Researchers investigated how the fill-in-the-middle (FIM) pretraining objective affects language model memorization compared to standard left-to-right (LTR) training. Their study, using Llama 3.2 models and a corpus with repeated text, found that FIM training leads to verbatim extraction that scales linearly with data repetitions. The research also highlighted that FIM's recall is strongly dependent on prefix context and that evaluating memorization requires careful consideration of span length and probe format. AI
IMPACT This research clarifies how specific pretraining methods influence model memorization, potentially guiding future model development towards desired recall behaviors.
RANK_REASON The cluster contains an academic paper detailing research findings on language model pretraining objectives.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →