New technique reveals open-weight LLMs can memorize entire copyrighted books

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

A new study on arXiv details a method for extracting memorized book content from open-weight language models. Researchers found that while most models do not extensively memorize most books, there are significant exceptions, with Llama 3.1 70B fully memorizing some titles like 'Harry Potter and the Sorcerer's Stone'. This extensive memorization allows for deterministic extraction of entire books using minimal prompts, impacting ongoing copyright disputes. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Findings could influence copyright litigation and model training practices regarding memorization of copyrighted material.

RANK_REASON Academic paper detailing a new method for extracting memorized content from LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
safety

COVERAGE [1]

arXiv cs.CL TIER_1 · A. Feder Cooper, Mark A. Lemley, Allison Casasola, Ahmed Ahmed, Aaron Gokaslan, Amy B. Cyphert, Christopher De Sa, Daniel E. Ho, Percy Liang · 2026-05-05 04:00

Extracting memorized pieces of (copyrighted) books from open-weight language models

arXiv:2505.12546v5 Announce Type: replace Abstract: Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) memorize protected expression from books in their training data. We s…

COVERAGE [1]

Extracting memorized pieces of (copyrighted) books from open-weight language models

RELATED ENTITIES

RELATED TOPICS