PulseAugur
EN
LIVE 20:17:54

AI Bots Ignore Robots.txt, Attempt Database Scans

Several AI-driven web crawlers, including those from Anthropic's Claude and OpenAI's GPT bot, have been observed ignoring robots.txt directives and attempting to scan databases. These bots, along with others from Baidu, Amazon, Meta, and Yandex, were blocked by the server administrator. The administrator expressed frustration, stating that these large corporations are attempting to steal resources and that a simultaneous surge of these bots could render servers unusable, citing a recent incident with their PieFed server. AI

IMPACT AI crawlers are aggressively scraping data, potentially impacting server resources and data privacy for smaller platforms.

RANK_REASON This is a user's complaint about AI bots, not a direct release or announcement from a frontier lab.

Read on Mastodon — fosstodon.org →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    In the past 24 hours, these spiders have ignored my robots.txt file and tried scanning the database anyway. They were all blocked. Claude (thousands more attemp

    In the past 24 hours, these spiders have ignored my robots.txt file and tried scanning the database anyway. They were all blocked. Claude (thousands more attempts than the others) Baiduspider Amazonbot Bytespider gptbot Meta-ExternalAgent YandexBot ChatGPT ByteSpider CommonCrawl …