PulseAugur
LIVE 10:07:22
research · [1 source] ·
0
research

Databricks faces 'extraordinary' copyright damages in author lawsuit over LLM training data

A U.S. judge has allowed a class-action lawsuit to proceed against Databricks, alleging that their DBRX large language model was trained on pirated copyrighted books. The authors claim Databricks acquired MosaicLM, which used the RedPajama dataset containing approximately 196,000 titles, including their works. Databricks has argued that the authors cannot prove DBRX was trained on this specific data, but the judge requires further information to determine if copyright infringement occurred. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Potential for significant damages in copyright infringement cases could impact LLM training data acquisition strategies.

RANK_REASON Class action lawsuit proceeding regarding copyright infringement in LLM training data.

Read on The Register — AI →

COVERAGE [1]

  1. The Register — AI TIER_1 · O'Ryan Johnson ·

    Databricks can't seem to shake authors' copyright claim that could result in 'extraordinary' damages

    <h4>Authors say it acquired an LLM that was trained on their copyrighted data, and judge keeps asking for more info</h4> <p>Databricks cannot shake a class action lawsuit targeting its LLM, which several book authors contend was created with a database that contained pirated vers…