English(EN) A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

新管道提高了长篇金融文档的AI抽取准确性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-29 09:19

研究人员开发了一个多阶段抽取框架，旨在提高从长篇、扫描的金融文档中抽取结构化信息的准确性。该管道集成了图像预处理、OCR、页面级检索以及基于视觉语言模型（VLM）的抽取，将页面定位与多模态推理分开。该框架在120份生产级KYC文档上进行了测试，取得了显著的改进，最佳配置的准确率达到了87.27%，比直接应用VLM高出31.9个百分点。 AI

影响增强了从复杂金融文档中抽取结构化数据的能力，有望简化合规和KYC工作流。

排序理由学术论文，详细介绍了从金融文档中抽取信息的新框架。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Yuxuan Han, Yuanxing Zhang, Yushuo Wang, Yichao Jin, Kenneth Zhu Ke, Jingyuan Zhao · 2026-04-30 04:00

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

arXiv:2604.26462v1 Announce Type: new Abstract: Structured information extraction from long, multilingual scanned financial documents is a core requirement in industrial KYC and compliance workflows. These documents are typically non machine readable, noisy, and visually heteroge…
arXiv cs.CV TIER_1 English(EN) · Jingyuan Zhao · 2026-04-29 09:19

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

Structured information extraction from long, multilingual scanned financial documents is a core requirement in industrial KYC and compliance workflows. These documents are typically non machine readable, noisy, and visually heterogeneous. They usually span dozens of pages while c…

报道来源 [2]

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

相关实体

相关话题