PulseAugur
EN
LIVE 11:13:34

AI models achieve 99% accuracy classifying historical document images

Researchers have developed a highly accurate image classification system for historical documents, capable of distinguishing between text, tables, and graphics. Fine-tuned deep learning models, specifically RegNetY-16GF and ViT-large, achieved over 99% accuracy on a dataset of over 48,000 scanned pages. This system is designed to facilitate content-specific processing in large-scale digitization projects, with the models, dataset, and software made publicly available under open-source licenses. AI

IMPACT Enables efficient content-specific processing for large historical document archives, accelerating digitization efforts.

RANK_REASON The cluster contains a research paper detailing a new AI model and its performance on a specific task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Kateryna Lutsai, Pavel Stra\v{n}\'ak, David Nov\'ak, Dana K\v{r}iv\'ankov\'a ·

    Page image classifier fine-tuned on century-spanning archives of scanned documents for further content-specific processing

    arXiv:2606.07558v1 Announce Type: cross Abstract: Purpose: Digitization projects in the humanities produce vast, heterogeneous archives of historical documents, making manual sorting impractical at scale. This work addresses the need for an automated system to classify scanned pa…