A Human-in-the-Loop Label Error Detection Framework Applied to Arabic-Script HTR Datasets
Researchers have developed a novel two-stage framework, CER-HV, designed to improve the quality of datasets used for training Handwritten Text Recognition (HTR) models, particularly for Arabic-script languages. The framework combines a Convolutional Recurrent Neural Network (CRNN) for automated error detection with a human-in-the-loop verification process. When applied to Arabic-script datasets, CER-HV successfully identified label errors such as transcription and segmentation mistakes, leading to an improvement of up to 1.8 percentage points in evaluation CER after dataset cleaning and model retraining. AI
IMPACT Improves dataset quality for Arabic HTR, potentially accelerating research and development in the field.