Javascript must be enabled for the correct page display

The Effects of Fine-Tuning on the ASR Performance of Dialectal Arabic

Özyilmaz, Ömer Tarik (2024) The Effects of Fine-Tuning on the ASR Performance of Dialectal Arabic. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MA-3951731-OT-Ozyilmaz.pdf

Download (746kB) | Preview

Abstract

Current commercial automatic speech recognition (ASR) applications support the formal Modern Standard Arabic (MSA). Yet, in conversational speech, the speakers’ dialect often determines the meaning, diacritics, and accent. In the current study, dialectal Arabic is investigated as a low- resource ASR problem, due to the insufficiency of variation, volume, and balance in dialectal speech datasets. We focus on improving the performance of OpenAI’s Whisper on five large Arabic dialects: Gulf, Levantine, Iraqi, Egyptian, and Maghrebi. The effect of MSA training size is evaluated to determine a proper cut-off point for the initial iteration of fine-tuning. Then, the outcome of this pre-training is evaluated to theorize whether the dialects share a commonality with each other and MSA. Finally, the difference in performance between dialect-specific and dialect-pooled models is presented and discussed. After fine-tuning a Whisper checkpoint on Mozilla Common Voice 16.1 for MSA and the large-scale MASC dataset for dialectal Arabic, we demonstrate the results using the word- and character error rate (WER and CER). We find that fine-tuning with a small amount of MSA training data can already show a large increase in performance and perform similarly to much larger models without fine-tuning. The effect of pre-training is minimal, leading us to believe that the differences between each dialect and MSA are too large to generalise. Further, a small drop in performance is found moving from dialect-specific to dialect-pooled models, and contrary to previous studies, we advocate that the benefits outweigh this cost. Dialect-pooled models present a exciting opportunity to reduce the data deficiency problem, especially paired with careful data curation. Overall, our experiments provide valuable insights for improving fine-tuning of dialectal Arabic ASR models and suggest potential implications for other low-resource languages.

Item Type: Thesis (Master)
Name supervisor: Schauble, J.K.
Date Deposited: 10 Jun 2024 10:21
Last Modified: 10 Jun 2024 10:21
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/464

Actions (login required)

View Item View Item