Databricks faces 'extraordinary' copyright damages in author lawsuit over LLM training data

By PulseAugur Editorial · [1 sources] · 2026-04-29 18:05

A U.S. judge has allowed a class-action lawsuit to proceed against Databricks, alleging that their DBRX large language model was trained on pirated copyrighted books. The authors claim Databricks acquired MosaicLM, which used the RedPajama dataset containing approximately 196,000 titles, including their works. Databricks has argued that the authors cannot prove DBRX was trained on this specific data, but the judge requires further information to determine if copyright infringement occurred. AI

IMPACT Potential for significant damages in copyright infringement cases could impact LLM training data acquisition strategies.

RANK_REASON Class action lawsuit proceeding regarding copyright infringement in LLM training data.

Read on The Register — AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Databricks faces 'extraordinary' copyright damages in author lawsuit over LLM training data

COVERAGE [1]

The Register — AI TIER_1 English(EN) · O'Ryan Johnson · 2026-04-29 18:05

Databricks can't seem to shake authors' copyright claim that could result in 'extraordinary' damages

<h4>Authors say it acquired an LLM that was trained on their copyrighted data, and judge keeps asking for more info</h4> <p>Databricks cannot shake a class action lawsuit targeting its LLM, which several book authors contend was created with a database that contained pirated vers…

COVERAGE [1]

Databricks can't seem to shake authors' copyright claim that could result in 'extraordinary' damages

RELATED ENTITIES

RELATED TOPICS