English(EN) When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

新的OCR基准测试揭示准确性不能保证RAG性能

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-05 04:00

开发了一个新的基准测试，用于评估光学字符识别（OCR）系统在检索增强生成（RAG）应用中的鲁棒性。目前使用字符级指标的OCR基准测试未能捕捉OCR错误如何影响现实世界工业场景中的下游RAG性能。该基准测试包含11种具有挑战性的文档类型，并揭示了高OCR准确性并不能保证有效的RAG，因为结构性和语义性错误会导致严重的检索失败。 AI

影响强调了在有效部署RAG系统时，OCR评估需要超越字符准确性。

排序理由这是一篇研究论文，介绍了一个用于在RAG背景下评估OCR系统的新基准测试。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

arXiv
OCR

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Lin Sun, Wang Dexian, Jingang Huang, Linglin Zhang, Change Jia, Zhengwei Cheng, Xiangzheng Zhang · 2026-05-05 04:00

When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

arXiv:2605.00911v1 Announce Type: new Abstract: Industrial Retrieval-Augmented Generation (RAG) systems depend on optical character recognition (OCR) to transform visual documents into text. Existing OCR benchmarks rely on character-level metrics, which inadequately measure downs…

报道来源 [1]

When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

相关实体

相关话题