A U.S. judge has allowed a class-action lawsuit to proceed against Databricks, alleging that their DBRX large language model was trained on pirated copyrighted books. The authors claim Databricks acquired MosaicLM, which used the RedPajama dataset containing approximately 196,000 titles, including their works. Databricks has argued that the authors cannot prove DBRX was trained on this specific data, but the judge requires further information to determine if copyright infringement occurred. AI
影响 Potential for significant damages in copyright infringement cases could impact LLM training data acquisition strategies.
排序理由 Class action lawsuit proceeding regarding copyright infringement in LLM training data.
- Anthropic
- Book3
- Brian Keene
- Databricks
- DBRX
- Hugging Face
- Jason Reynolds
- Judge Charles Breyer
- LLAMA
- Meta
- MosaicLM
- Rebeccas Makkai
- RedPajama
- Stuart O’Nan
- The Great Believers
- U.S. District Court in Northern California
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →