PulseAugur
实时 11:17:33
English(EN) LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake

新的LakeQA基准通过海量数据搜索和推理挑战LLM

研究人员推出了LakeQA,这是一个旨在测试大型语言模型在海量数据湖中搜索和推理能力的新基准。该基准使用了约9.5 TB的各种数据,包括维基百科和政府数据集,需要跨多个来源进行多跳推理和证据组合。初步实验表明,即使是GPT-5.2等先进模型也难以胜任这项任务,精确匹配得分仅为18.37%,凸显了LakeQA在开发有效的LLM代理方面所带来的挑战。 AI

影响 为评估LLM代理在大型、非结构化数据集上进行搜索和推理的能力建立了一个新的、具有挑战性的基准。

排序理由 该集群包含一篇介绍LLM评估新基准的研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Haonan Wang, Jiaxiang Liu, Yurong Liu, Austin Senna Wijaya, Tianle Zhou, Eden Wu, Yijia Chen, Wanting You, Reya Vir, Daniela Pinto, Grace Fan, Yusen Zhang, Juliana Freire, Eugene Wu ·

    LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake

    arXiv:2606.10460v1 Announce Type: cross Abstract: Recent large language models (LLMs) have shown rapid progress in reading-based question answering (QA), where evidence is explicitly provided or can be trivially retrieved. In contrast, real-world questions are often not paired wi…

  2. arXiv cs.CL TIER_1 English(EN) · Eugene Wu ·

    LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake

    Recent large language models (LLMs) have shown rapid progress in reading-based question answering (QA), where evidence is explicitly provided or can be trivially retrieved. In contrast, real-world questions are often not paired with accurate evidence documents. The useful evidenc…