PulseAugur
EN
LIVE 22:11:46

FIM pretraining increases language model verbatim memorization

Researchers investigated how the fill-in-the-middle (FIM) pretraining objective affects language model memorization compared to standard left-to-right (LTR) training. Their study, using Llama 3.2 models and a corpus with repeated text, found that FIM training leads to verbatim extraction that scales linearly with data repetitions. The research also highlighted that FIM's recall is strongly dependent on prefix context and that evaluating memorization requires careful consideration of span length and probe format. AI

IMPACT This research clarifies how specific pretraining methods influence model memorization, potentially guiding future model development towards desired recall behaviors.

RANK_REASON The cluster contains an academic paper detailing research findings on language model pretraining objectives.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Tobias von Arx, Tanguy Dieudonn\'e ·

    Memorization Dynamics of Fill-in-the-Middle Pretraining

    arXiv:2605.22981v1 Announce Type: cross Abstract: Fill-in-the-middle (FIM) is a pretraining objective widely used to equip causal language models with infilling ability, yet its effect on verbatim memorization remains underexplored. We study the memorization dynamics of FIM in a …

  2. arXiv cs.CL TIER_1 English(EN) · Tanguy Dieudonné ·

    Memorization Dynamics of Fill-in-the-Middle Pretraining

    Fill-in-the-middle (FIM) is a pretraining objective widely used to equip causal language models with infilling ability, yet its effect on verbatim memorization remains underexplored. We study the memorization dynamics of FIM in a controlled setting by pretraining matched Llama 3.…