PulseAugur
实时 10:50:44
English(EN) Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

新框架对企业人工智能文档处理流水线进行基准测试

研究人员开发了EnterpriseDocBench,这是一个用于评估企业人工智能文档处理流水线端到端性能的新框架。该框架跨越六个企业领域评估解析保真度、索引效率、检索相关性和生成基础性。初步测试显示,混合检索方法略优于BM25,并且令人惊讶的是,与中等长度的文档相比,非常短和非常长的文档中的幻觉率更高。一个关键的发现是,虽然事实准确性很高,但答案的完整性却显著较低,这表明人工智能系统经常遗漏关键信息。 AI

影响 突显了企业人工智能的一个关键差距:准确性高但答案完整性低,影响实际部署。

排序理由 该集群描述了一篇介绍人工智能系统评估框架的新学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新框架对企业人工智能文档处理流水线进行基准测试

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Saurabh K. Singh, Sachin Raj ·

    Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

    arXiv:2604.26382v1 Announce Type: new Abstract: Most enterprise document AI today is a pipeline. Parse, index, retrieve, generate. Each of those stages has been studied to death on its own -- what's still hard is evaluating the system as a whole. We built EnterpriseDocBench to ta…

  2. arXiv cs.CL TIER_1 English(EN) · Sachin Raj ·

    Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

    Most enterprise document AI today is a pipeline. Parse, index, retrieve, generate. Each of those stages has been studied to death on its own -- what's still hard is evaluating the system as a whole. We built EnterpriseDocBench to take a swing at it: parsing fidelity, indexing eff…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

    Most enterprise document AI today is a pipeline. Parse, index, retrieve, generate. Each of those stages has been studied to death on its own -- what's still hard is evaluating the system as a whole. We built EnterpriseDocBench to take a swing at it: parsing fidelity, indexing eff…