Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 10h

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

Researchers have conducted a comparative study on pretrained Transformer models for Quranic Automatic Speech Recognition (ASR), aiming to reduce high Word Error Rates (WER) on user-recited verses. The study fine-tuned models like Wav2Vec2.0, HuBERT, and XLS-R on an 870-hour Quranic dataset, identifying key factors for transcription accuracy. The best configuration achieved a WER of 0.08 on the EveryAyah subset, a significant improvement over the Citrinet baseline, while also reducing training time. AI

IMPACT Improves accuracy and efficiency for specialized ASR tasks, potentially aiding Quranic study and accessibility.

Hugging Face
transformer
Hubert
wav2vec2.0
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
EveryAyah
Tartil
Citrinet
Wav2Vec2-XLSR-53