Tepei, Maria (2024) Addressing ASR Bias Against Foreign-Accented Dutch: A Synthetic Data Approach. Master thesis, Voice Technology (VT).
|
PDF
TepeiMarias5713544VTthesisfinal.pdf Download (430kB) | Preview |
Abstract
Despite substantial improvements in automatic speech recognition (ASR) over the last years, the high performance achieved for ”standard speakers” does not hold across all genders, ages, or foreign accents. As a result, an important area of research is inclusive ASR, aimed at reducing the performance gaps such systems display across subgroups of the population. In the present thesis, I evaluate one of the most recent and robust ASR systems (OpenAI’s Whisper) to uncover and assess the level of bias it displays against foreign-accented Dutch. Additionally, I investigate whether synthetically accented speech samples obtained from a fine-tuned speech synthesis model (FastSpeech2) can act as a viable data augmentation tool to create additional training data for Whisper, in a fine-tuning transfer learning paradigm. By investigating bias, as opposed to WER reduction, I specifically pay attention to both the improvement in performance on foreign-accented Dutch and the potential decrease in performance on native Dutch. Experimental results show that fine-tuning Whisper on synthetic accented speech data does increase its performance on natural speech samples, although this comes at the cost of decreased performance on native samples after fine-tuning. Additionally, the insights from fine-tuning Whisper put into question its suitability for this learning paradigm, as its large number of parameters displays increased stability on small, low-resource datasets.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Do, T.P. |
Date Deposited: | 12 Jun 2024 08:38 |
Last Modified: | 12 Jun 2024 08:38 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/470 |
Actions (login required)
View Item |