Researchers have developed a multistage extraction framework designed to improve the accuracy of extracting structured information from long, scanned financial documents. This pipeline integrates image preprocessing, OCR, page-level retrieval, and vision-language model (VLM) based extraction, separating page localization from multimodal reasoning. Tested on 120 production KYC documents, the framework demonstrated significant improvements, with the best configuration achieving 87.27 percent accuracy, outperforming direct VLM application by up to 31.9 percentage points. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Enhances structured data extraction from complex financial documents, potentially streamlining compliance and KYC workflows.
RANK_REASON Academic paper detailing a new framework for information extraction from financial documents.