PulseAugur
EN
LIVE 12:29:04

New Dataset Targets Low-Resource Handwritten Text Recognition for Ancient Coptic Manuscripts

Researchers have introduced SCAM, a new dataset designed for Handwritten Text Recognition (HTR) of Sahidic Coptic ancient manuscripts. This dataset addresses the challenges of low-resource languages, rare scripts, and degraded historical documents, combining heterogeneous acquisition conditions with typical manuscript degradations like ink fading and material deterioration. Benchmarking current state-of-the-art HTR approaches on SCAM highlights their limitations in low-resource, historically grounded scenarios, providing a benchmark for future developments in the field. AI

IMPACT This dataset could advance research in low-resource HTR, potentially improving AI's ability to process historical and underrepresented languages.

RANK_REASON The cluster is about a new academic paper introducing a dataset for a specific research task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CV TIER_1 English(EN) · Fabio Quattrini, Carmine Zaccagnino, Costanza Bianchi, Silvia Cascianelli, Rita Cucchiara ·

    A Text Recognition Dataset from Sahidic Coptic Ancient Manuscripts

    arXiv:2606.15987v1 Announce Type: new Abstract: In this work, we target Handwritten Text Recognition (HTR) in low-resource scenarios, which arise from underrepresented languages, rare scripts, and degraded visual conditions typical of historical documents. We introduce SCAM (Sahi…