PulseAugur
EN
LIVE 19:27:10

340+ US News Sites Block Internet Archive Over AI Training Data Fears

Over 340 local news organizations in the United States have restricted the Internet Archive's access to their content. This action stems from concerns that AI companies may use the archive's vast repositories of journalistic work to train their models. The move highlights growing anxieties among news publishers about the use of their data in AI development. AI

IMPACT News publishers are actively pushing back against AI data scraping, potentially impacting future model training data availability.

RANK_REASON Significant number of news organizations taking action due to AI concerns. [lever_c_demoted from significant: ic=2 ai=0.4]

Read on Mastodon — mastodon.social →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    "More than 340 local news sites across the US are now limiting the # InternetArchive ’s ability to access and preserve their stories due to concerns that # AI c

    "More than 340 local news sites across the US are now limiting the # InternetArchive ’s ability to access and preserve their stories due to concerns that # AI companies might scrape the nonprofit’s repositories for training data" https://www. niemanlab.org/2026/05/more-tha n-340-…

  2. Mastodon — mastodon.social TIER_1 English(EN) · oaklandprivacy ·

    "More than 340 local news sites across the US are now limiting the # InternetArchive ’s ability to access and preserve their stories due to concerns that # AI c

    "More than 340 local news sites across the US are now limiting the # InternetArchive ’s ability to access and preserve their stories due to concerns that # AI companies might scrape the nonprofit’s repositories for training data" https://www. niemanlab.org/2026/05/more-tha n-340-…