A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

arXiv:2606.19747v1 Announce Type: new Abstract: Quran Automatic Speech Recognition (ASR) aims to convert Quranic recitation into text, enabling applications such as aided memorisation tools and Quranic search engines. However, existing ASR models often exhibit high Word Error Rates (WER) on user-recited verses and lack full coverage of the Quranic corpus. This paper presents a systematic empirical study of domain-specific fine-tuning of pretrained Transformer-based models for Quranic ASR, using advanced speech feature extraction methods: Wav2Vec2.0, HuBERT, and XLS-R. These models apply self-s
This research is emerging now due to the rapid advancements in transformer models and speech representation learning, making sophisticated ASR for niche languages and dialects more feasible.
Improving Quranic ASR can enhance accessibility and utility of religious texts, potentially boosting digital literacy and engagement within significant global populations.
The accuracy of automated Quranic recitation transcription is expected to improve, enabling better applications for learning, memorization, and search.
- · ASR developers
- · Education technology
- · Muslim communities
- · Religious institutions
- · Traditional learning methods
- · Inefficient manual transcription services
Enhanced tools for Quranic study and memorization become more widespread and effective.
Increased digital engagement with religious texts could foster new forms of community and learning platforms.
The development of highly accurate domain-specific ASR could set a precedent for similar applications across other less-resourced languages and cultural materials.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI