PulseAugur
实时 09:05:24

New technique reveals open-weight LLMs can memorize entire copyrighted books

A new study on arXiv details a method for extracting memorized book content from open-weight language models. Researchers found that while most models do not extensively memorize most books, there are significant exceptions, with Llama 3.1 70B fully memorizing some titles like 'Harry Potter and the Sorcerer's Stone'. This extensive memorization allows for deterministic extraction of entire books using minimal prompts, impacting ongoing copyright disputes. AI

影响 Findings could influence copyright litigation and model training practices regarding memorization of copyrighted material.

排序理由 Academic paper detailing a new method for extracting memorized content from LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New technique reveals open-weight LLMs can memorize entire copyrighted books

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · A. Feder Cooper, Mark A. Lemley, Allison Casasola, Ahmed Ahmed, Aaron Gokaslan, Amy B. Cyphert, Christopher De Sa, Daniel E. Ho, Percy Liang ·

    Extracting memorized pieces of (copyrighted) books from open-weight language models

    arXiv:2505.12546v5 Announce Type: replace Abstract: Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) memorize protected expression from books in their training data. We s…