Researchers have introduced the Office Comprehension Benchmark (OCB), a new evaluation tool designed to assess large language models' understanding of native Microsoft Office file formats (.docx, .xlsx, .pptx). The benchmark includes two tracks: File Fidelity Q&A, which tests the models' ability to perceive structural and visual elements within documents, and Domain Q&A, which evaluates expert-level reasoning across 12 professional domains. Initial testing revealed that even top-tier frontier systems achieved only around 59.3% accuracy on the Domain Q&A track, indicating significant room for improvement in complex document comprehension. AI
IMPACT This benchmark could drive improvements in LLMs' ability to process and reason over complex, real-world business documents.
RANK_REASON The cluster describes a new academic benchmark for evaluating LLM capabilities on specific document types. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →