Researchers have developed and benchmarked an adaptive Optical Character Recognition (OCR) pipeline designed for digitizing retail bills across various commercial sectors. The system incorporates a CNN-based image enhancement module, an image quality analyzer, a feedback loop for iterative retries, and an NLP-based correction layer. Tested on a dataset of 360 retail bills, the pipeline achieved a Character Error Rate (CER) of 18.4% and a Word Error Rate (WER) of 27.6%, significantly outperforming the Raw Tesseract baseline and demonstrating a notable speed advantage over EasyOCR. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Establishes a new benchmark for OCR in retail, potentially improving data extraction efficiency for businesses.
RANK_REASON Academic paper detailing a new OCR pipeline and its benchmarked performance.