PulseAugur
实时 17:05:36
English(EN) Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

新研究解决AI代理引用准确性和文档格式效率问题

一个名为“Cited but Not Verified”的新评估框架已被开发出来,用于评估大型语言模型(LLM)在研究代理中的来源归因能力。该框架解析并评估LLM生成的报告中的行内引用,涵盖三个维度:链接可访问性、内容相关性和事实准确性。对14个LLM的基准测试显示,虽然前沿模型保持了较高的链接有效性和相关性,但其引用的事实准确性显著较低,尤其是在检索深度增加的情况下。另外,为了解决当前LLM代理在文档处理方面的低效率问题,提出了一种名为ObjectGraph (.og)的新文件格式,将文档重新构想为可遍历的知识图谱而非线性文本。 AI

影响 新的评估框架和文件格式正在涌现,以提高LLM代理在研究和信息综合中的可靠性和效率。

排序理由 该集群包含两篇学术论文,详细介绍了用于LLM研究的新框架和文件格式。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

新研究解决AI代理引用准确性和文档格式效率问题

报道来源 [4]

  1. arXiv cs.CL TIER_1 English(EN) · Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, Corey Feld ·

    引用但未验证:解析和评估大型语言模型深度研究代理中的来源归因

    arXiv:2605.06635v1 Announce Type: new Abstract: Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cit…

  2. arXiv cs.CL TIER_1 English(EN) · Corey Feld ·

    引用但未验证:解析和评估大型语言模型深度研究代理中的来源归因

    Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-…

  3. arXiv cs.AI TIER_1 English(EN) · Mohit Dubey, Open Gigantic ·

    ObjectGraph:从文档注入到知识遍历——面向Agentic时代的原生文件格式

    arXiv:2604.27820v1 Announce Type: new Abstract: Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their contex…

  4. arXiv cs.AI TIER_1 English(EN) · Open Gigantic ·

    ObjectGraph:从文档注入到知识遍历——面向Agentic时代的原生文件格式

    Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, …