PulseAugur
实时 11:07:44

New Dataset Targets Low-Resource Handwritten Text Recognition for Ancient Coptic Manuscripts

Researchers have introduced SCAM, a new dataset designed for Handwritten Text Recognition (HTR) of Sahidic Coptic ancient manuscripts. This dataset addresses the challenges of low-resource languages, rare scripts, and degraded historical documents, combining heterogeneous acquisition conditions with typical manuscript degradations like ink fading and material deterioration. Benchmarking current state-of-the-art HTR approaches on SCAM highlights their limitations in low-resource, historically grounded scenarios, providing a benchmark for future developments in the field. AI

影响 This dataset could advance research in low-resource HTR, potentially improving AI's ability to process historical and underrepresented languages.

排序理由 The cluster is about a new academic paper introducing a dataset for a specific research task. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

报道来源 [1]

  1. arXiv cs.CV TIER_1 English(EN) · Fabio Quattrini, Carmine Zaccagnino, Costanza Bianchi, Silvia Cascianelli, Rita Cucchiara ·

    A Text Recognition Dataset from Sahidic Coptic Ancient Manuscripts

    arXiv:2606.15987v1 Announce Type: new Abstract: In this work, we target Handwritten Text Recognition (HTR) in low-resource scenarios, which arise from underrepresented languages, rare scripts, and degraded visual conditions typical of historical documents. We introduce SCAM (Sahi…