Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 8h

A Human-in-the-Loop Label Error Detection Framework Applied to Arabic-Script HTR Datasets

Researchers have developed a novel two-stage framework, CER-HV, designed to improve the quality of datasets used for training Handwritten Text Recognition (HTR) models, particularly for Arabic-script languages. The framework combines a Convolutional Recurrent Neural Network (CRNN) for automated error detection with a human-in-the-loop verification process. When applied to Arabic-script datasets, CER-HV successfully identified label errors such as transcription and segmentation mistakes, leading to an improvement of up to 1.8 percentage points in evaluation CER after dataset cleaning and model retraining. AI

IMPACT Improves dataset quality for Arabic HTR, potentially accelerating research and development in the field.

Sana Sabah Sabry Al-Azzawi
KHATT
Convolutional Recurrent Neural Network
CER-HV
Ajami
Muharaf