English(EN) llms.txt and the Quiet Pact Between Sites and Crawlers

Anna's Archive 通过 llms.txt 指导 AI 爬虫

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-23 02:10

Anna's Archive 推出了 `llms.txt` 文件，以指导 AI 爬虫避开其主网站，转向批量数据端点。此举旨在减少验证码破解机器人对服务器造成的压力，并可能通过企业级数据访问产生收入。这一借鉴 `robots.txt` 的约定正被其他网站采纳，为大型语言模型提供精选内容索引或简单指令，尽管它缺乏强制执行机制。 AI

影响为 AI 爬虫与网站的交互建立了一种新约定，有可能改善数据访问并减少抓取摩擦。

排序理由讨论了 AI 爬虫的新约定及其被特定网站采纳的情况，没有直接的模型发布或重大行业事件。

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

dev.to — LLM tag TIER_1 English(EN) · Thousand Miles AI · 2026-05-23 04:39

Anna's Archive llms.txt：LLM爬虫的路由指南

<p>Anna's Archive published a page on February 18, 2026 with one specific addressee: LLM crawlers. The site holds 64,416,225 books and 95,689,473 papers, has been served behind CAPTCHAs designed to deter bulk scraping, and has now written a polite, machine-readable note asking mo…
dev.to — LLM tag TIER_1 English(EN) · Alan West · 2026-05-23 02:10

llms.txt 与网站和爬虫之间的默契

<p>I stumbled onto the Anna's Archive post about <code>llms.txt</code> last week and it kicked off a whole evening of me poking around my own projects. The premise is simple: a plain-text file at the root of your domain that tells LLM crawlers what they should and shouldn't do. T…

报道来源 [2]

Anna's Archive llms.txt：LLM爬虫的路由指南

llms.txt 与网站和爬虫之间的默契

相关实体

相关话题