English(EN) Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

新框架对企业人工智能文档处理流水线进行基准测试

作者 PulseAugur 编辑部 · [3 个来源] · 2026-04-29 07:48

研究人员开发了EnterpriseDocBench，这是一个用于评估企业人工智能文档处理流水线端到端性能的新框架。该框架跨越六个企业领域评估解析保真度、索引效率、检索相关性和生成基础性。初步测试显示，混合检索方法略优于BM25，并且令人惊讶的是，与中等长度的文档相比，非常短和非常长的文档中的幻觉率更高。一个关键的发现是，虽然事实准确性很高，但答案的完整性却显著较低，这表明人工智能系统经常遗漏关键信息。 AI

影响突显了企业人工智能的一个关键差距：准确性高但答案完整性低，影响实际部署。

排序理由该集群描述了一篇介绍人工智能系统评估框架的新学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CL TIER_1 English(EN) · Saurabh K. Singh, Sachin Raj · 2026-04-30 04:00

Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

arXiv:2604.26382v1 Announce Type: new Abstract: Most enterprise document AI today is a pipeline. Parse, index, retrieve, generate. Each of those stages has been studied to death on its own -- what's still hard is evaluating the system as a whole. We built EnterpriseDocBench to ta…
arXiv cs.CL TIER_1 English(EN) · Sachin Raj · 2026-04-29 07:48

Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

Most enterprise document AI today is a pipeline. Parse, index, retrieve, generate. Each of those stages has been studied to death on its own -- what's still hard is evaluating the system as a whole. We built EnterpriseDocBench to take a swing at it: parsing fidelity, indexing eff…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-29 07:48

Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

Most enterprise document AI today is a pipeline. Parse, index, retrieve, generate. Each of those stages has been studied to death on its own -- what's still hard is evaluating the system as a whole. We built EnterpriseDocBench to take a swing at it: parsing fidelity, indexing eff…

报道来源 [3]

Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

相关实体

相关话题