DocRevive: A Unified Pipeline for Document Text Restoration
Researchers have developed DocRevive, a novel pipeline designed to restore damaged or incomplete text in documents. This system integrates Optical Character Recognition (OCR), image analysis, masked language modeling, and diffusion models to reconstruct text while maintaining visual fidelity. A new dataset of over 30,000 degraded document images was created to benchmark this restoration process, and a Unified Context Similarity Metric (UCSM) was proposed to evaluate the quality of the reconstructed text. AI
IMPACT Advances document restoration techniques, potentially improving digital preservation and archival research.