PulseAugur
EN
LIVE 10:13:25

New benchmark tests LLMs on Word, Excel, and PowerPoint comprehension

Researchers have introduced the Office Comprehension Benchmark (OCB), a new evaluation tool designed to assess large language models' understanding of native Microsoft Office file formats (.docx, .xlsx, .pptx). The benchmark includes two tracks: File Fidelity Q&A, which tests the models' ability to perceive structural and visual elements within documents, and Domain Q&A, which evaluates expert-level reasoning across 12 professional domains. Initial testing revealed that even top-tier frontier systems achieved only around 59.3% accuracy on the Domain Q&A track, indicating significant room for improvement in complex document comprehension. AI

IMPACT This benchmark could drive improvements in LLMs' ability to process and reason over complex, real-world business documents.

RANK_REASON The cluster describes a new academic benchmark for evaluating LLM capabilities on specific document types. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New benchmark tests LLMs on Word, Excel, and PowerPoint comprehension

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Firoz Shaik, Mateus Pican\c{c}o Lima Gomes, Tanvir Aumi, Jingci Wang, Milos Milunovic, Filip Basara, Ivana Jovanovic, Vishwas Suryanarayanan, Neha Nandan Kenkare, Weiyao Xie, Zhipeng Han, Zheng Zhang, Waleed Shahid, Jay Rathi, Russell Scherer, Thong Q. N… ·

    Office Comprehension Benchmark

    arXiv:2607.01245v1 Announce Type: cross Abstract: We introduce Office Comprehension Bench (OCB), the first public benchmark to jointly evaluate LLM systems on Word, Excel, and PowerPoint comprehension over native file formats (.docx, .xlsx, .pptx) and their variants. OCB consists…