PulseAugur
EN
LIVE 19:37:59

New technique reveals open-weight LLMs can memorize entire copyrighted books

A new study on arXiv details a method for extracting memorized book content from open-weight language models. Researchers found that while most models do not extensively memorize most books, there are significant exceptions, with Llama 3.1 70B fully memorizing some titles like 'Harry Potter and the Sorcerer's Stone'. This extensive memorization allows for deterministic extraction of entire books using minimal prompts, impacting ongoing copyright disputes. AI

IMPACT Findings could influence copyright litigation and model training practices regarding memorization of copyrighted material.

RANK_REASON Academic paper detailing a new method for extracting memorized content from LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New technique reveals open-weight LLMs can memorize entire copyrighted books

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · A. Feder Cooper, Mark A. Lemley, Allison Casasola, Ahmed Ahmed, Aaron Gokaslan, Amy B. Cyphert, Christopher De Sa, Daniel E. Ho, Percy Liang ·

    Extracting memorized pieces of (copyrighted) books from open-weight language models

    arXiv:2505.12546v5 Announce Type: replace Abstract: Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) memorize protected expression from books in their training data. We s…