AI companies are reportedly acquiring physical copies of older books from secondhand bookstores, particularly those not yet digitized. These books are then scanned to train AI models, with the physical copies often being destroyed afterward. This practice raises questions about data sourcing and the preservation of physical media. AI
IMPACT This practice highlights novel data acquisition methods for AI training, potentially impacting the value of physical media and raising ethical considerations.
RANK_REASON The item discusses a practice by AI companies rather than a direct release or research finding.
Read on Mastodon — fosstodon.org →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →