Researchers have developed and benchmarked an adaptive Optical Character Recognition (OCR) pipeline specifically designed for digitizing diverse retail bills. This system incorporates a CNN-based enhancement module, an image quality analyzer, and an NLP-based correction layer to handle variations in scan quality and layout. The proposed pipeline demonstrated significant improvements over the Tesseract baseline, achieving a Character Error Rate of 18.4% and a Word Error Rate of 27.6% on a dataset of 360 retail bill images. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Establishes a new benchmark for OCR in retail bill digitization, potentially improving efficiency for businesses dealing with varied document formats.
RANK_REASON This is a research paper detailing a new OCR pipeline and its benchmark results.