For the Global Majority world, the dilemma is sharp. Many languages and knowledge traditions are underrepresented in training data. But “fixing” that via ungove
Rohini argues that AI training data should be treated as a collectively stewarded resource, emphasizing community consent, attribution, and benefit-sharing. She highlights that public knowledge repositories like Wikipedia are exploited by LLMs without sustaining the commons, and that copyright reform alone is insufficient. For the Global Majority, underrepresentation in training data is a critical issue, and simply scraping data without community control constitutes 'extractive inclusion' and epistemic violence. AI
IMPACT Calls for new AI governance models that prioritize community consent and benefit-sharing for training data.