Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 8h

A Text Recognition Dataset from Sahidic Coptic Ancient Manuscripts

Researchers have introduced SCAM, a new dataset designed for Handwritten Text Recognition (HTR) of Sahidic Coptic ancient manuscripts. This dataset addresses the challenges of low-resource languages, rare scripts, and degraded historical documents, combining heterogeneous acquisition conditions with typical manuscript degradations like ink fading and material deterioration. Benchmarking current state-of-the-art HTR approaches on SCAM highlights their limitations in low-resource, historically grounded scenarios, providing a benchmark for future developments in the field. AI

IMPACT This dataset could advance research in low-resource HTR, potentially improving AI's ability to process historical and underrepresented languages.

Silvia Cascianelli PhD
Sahidic Coptic