English(EN) PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation

PixelRAG 使用网页截图而非文本来改进 LLM 检索

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-30 04:00

一篇新研究论文介绍 PixelRAG，这是一种新颖的检索增强生成 (RAG) 方法，它使用网页截图而非文本来增强大型语言模型。该方法通过直接在像素空间操作，以视觉方式表示网站，从而绕过了传统的文本解析。PixelRAG 已扩展到一个包含 3000 万张图像的语料库，并在文本中心问答和多模态 QA 等各种任务上展示了优于基于文本的 RAG 基线的性能。该方法还通过图像压缩提高了效率，可能降低了 token 成本。 AI

影响挑战了 LLM 网页检索中基于文本表示的必要性，可能提高效率和性能。

排序理由一篇介绍新颖的检索增强生成方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yichuan Wang, Zhifei Li, Zirui Wang, Paul Teiletche, Lesheng Jin, Matei Zaharia, Joseph E. Gonzalez, Sewon Min · 2026-06-30 04:00

PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation

arXiv:2606.28344v1 Announce Type: cross Abstract: Augmenting large language models (LLMs) with retrieved web text has become a dominant paradigm, yet the web is not natively textual: existing systems depend on complex parsing pipelines that linearize HTML and discard layout, visu…

报道来源 [1]

PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation

相关话题