PulseAugur
实时 07:12:01

Databricks faces 'extraordinary' copyright damages in author lawsuit over LLM training data

A U.S. judge has allowed a class-action lawsuit to proceed against Databricks, alleging that their DBRX large language model was trained on pirated copyrighted books. The authors claim Databricks acquired MosaicLM, which used the RedPajama dataset containing approximately 196,000 titles, including their works. Databricks has argued that the authors cannot prove DBRX was trained on this specific data, but the judge requires further information to determine if copyright infringement occurred. AI

影响 Potential for significant damages in copyright infringement cases could impact LLM training data acquisition strategies.

排序理由 Class action lawsuit proceeding regarding copyright infringement in LLM training data.

在 The Register — AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Databricks faces 'extraordinary' copyright damages in author lawsuit over LLM training data

报道来源 [1]

  1. The Register — AI TIER_1 English(EN) · O'Ryan Johnson ·

    Databricks can't seem to shake authors' copyright claim that could result in 'extraordinary' damages

    <h4>Authors say it acquired an LLM that was trained on their copyrighted data, and judge keeps asking for more info</h4> <p>Databricks cannot shake a class action lawsuit targeting its LLM, which several book authors contend was created with a database that contained pirated vers…