Apple research: Pruning training data boosts LLM fact memorization

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Apple researchers have developed a data pruning technique to improve the factual memorization capabilities of large language models. Their method, detailed in a paper accepted at ICLR 2026, addresses the issue of LLMs struggling to retain factual knowledge, which can lead to hallucinations. By selecting data based on training loss to limit the number of facts and flatten their distribution, they demonstrated that a smaller GPT2-Small model could memorize 1.3 times more entity facts than standard training, matching the performance of a model ten times its size. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON The cluster contains an academic paper from Apple Machine Learning Research detailing a new method for improving LLM factual memorization.

Read on Apple Machine Learning Research →

Apple research: Pruning training data boosts LLM fact memorization

COVERAGE [1]

Apple Machine Learning Research TIER_1 · 2026-04-13 00:00

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR 2026. Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-in…

COVERAGE [1]

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

RELATED TOPICS