A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition
Researchers have conducted a comparative study on pretrained Transformer models for Quranic Automatic Speech Recognition (ASR), aiming to reduce high Word Error Rates (WER) on user-recited verses. The study fine-tuned models like Wav2Vec2.0, HuBERT, and XLS-R on an 870-hour Quranic dataset, identifying key factors for transcription accuracy. The best configuration achieved a WER of 0.08 on the EveryAyah subset, a significant improvement over the Citrinet baseline, while also reducing training time. AI
IMPACT Improves accuracy and efficiency for specialized ASR tasks, potentially aiding Quranic study and accessibility.