新基准测试 LLM 对 Word、Excel 和 PowerPoint 的理解能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员推出了 Office Comprehension Benchmark (OCB)，这是一个旨在评估大型语言模型对原生 Microsoft Office 文件格式（.docx、.xlsx、.pptx）理解能力的新评估工具。该基准测试包含两个部分：文件保真度问答（File Fidelity Q&A），测试模型感知文档内结构和视觉元素的能力；以及领域问答（Domain Q&A），评估在 12 个专业领域的专家级推理能力。初步测试显示，即使是顶尖的前沿系统在领域问答部分的准确率也仅达到 59.3% 左右，表明在复杂文档理解方面仍有很大的提升空间。 AI

影响该基准测试有望推动 LLM 处理和推理复杂、真实的商业文档的能力的提升。

排序理由该集群描述了一个用于评估 LLM 在特定文档类型上能力的新学术基准测试。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

新基准测试 LLM 对 Word、Excel 和 PowerPoint 的理解能力

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Firoz Shaik, Mateus Pican\c{c}o Lima Gomes, Tanvir Aumi, Jingci Wang, Milos Milunovic, Filip Basara, Ivana Jovanovic, Vishwas Suryanarayanan, Neha Nandan Kenkare, Weiyao Xie, Zhipeng Han, Zheng Zhang, Waleed Shahid, Jay Rathi, Russell Scherer, Thong Q. N… · 2026-07-03 04:00

Office Comprehension Benchmark

arXiv:2607.01245v1 Announce Type: cross Abstract: We introduce Office Comprehension Bench (OCB), the first public benchmark to jointly evaluate LLM systems on Word, Excel, and PowerPoint comprehension over native file formats (.docx, .xlsx, .pptx) and their variants. OCB consists…

报道来源 [1]

Office Comprehension Benchmark

相关实体

相关话题