AI models erase open-source attribution during training

By PulseAugur Editorial · [1 sources] · 2026-05-30 15:28

Open-source licenses, such as MIT, BSD, and Apache, typically require attribution to the original creators. However, when large language models are trained on code repositories using these licenses, the authorship information is often stripped. This practice effectively erases the identity of the human creators, even though the models can reproduce their work. AI

IMPACT Concerns about AI models disregarding open-source licensing could lead to legal challenges and a re-evaluation of training data practices.

RANK_REASON The item discusses the implications of LLM training on open-source licenses and attribution, which is an opinion or analysis piece.

Read on Mastodon — mastodon.social →

policy
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-05-30 15:28

The Erasure of Attribution (MIT / BSD / Apache licenses) Permissive licenses ask for very little, often just a single copyright notice or an AUTHORS file preser

The Erasure of Attribution (MIT / BSD / Apache licenses) Permissive licenses ask for very little, often just a single copyright notice or an AUTHORS file preserved in the documentation. When an LLM ingests this repository, the authorship data is stripped. The model retains the ab…

COVERAGE [1]

The Erasure of Attribution (MIT / BSD / Apache licenses) Permissive licenses ask for very little, often just a single copyright notice or an AUTHORS file preser

RELATED ENTITIES

RELATED TOPICS