PulseAugur
实时 22:37:20
English(EN) llms.txt and the Quiet Pact Between Sites and Crawlers

Anna's Archive 通过 llms.txt 指导 AI 爬虫

Anna's Archive 推出了 `llms.txt` 文件,以指导 AI 爬虫避开其主网站,转向批量数据端点。此举旨在减少验证码破解机器人对服务器造成的压力,并可能通过企业级数据访问产生收入。这一借鉴 `robots.txt` 的约定正被其他网站采纳,为大型语言模型提供精选内容索引或简单指令,尽管它缺乏强制执行机制。 AI

影响 为 AI 爬虫与网站的交互建立了一种新约定,有可能改善数据访问并减少抓取摩擦。

排序理由 讨论了 AI 爬虫的新约定及其被特定网站采纳的情况,没有直接的模型发布或重大行业事件。

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. dev.to — LLM tag TIER_1 English(EN) · Thousand Miles AI ·

    Anna's Archive llms.txt: a routing guide for LLM crawlers

    <p>Anna's Archive published a page on February 18, 2026 with one specific addressee: LLM crawlers. The site holds 64,416,225 books and 95,689,473 papers, has been served behind CAPTCHAs designed to deter bulk scraping, and has now written a polite, machine-readable note asking mo…

  2. dev.to — LLM tag TIER_1 English(EN) · Alan West ·

    llms.txt and the Quiet Pact Between Sites and Crawlers

    <p>I stumbled onto the Anna's Archive post about <code>llms.txt</code> last week and it kicked off a whole evening of me poking around my own projects. The premise is simple: a plain-text file at the root of your domain that tells LLM crawlers what they should and shouldn't do. T…