Aladdin, Alla dien, Allendien, please evaluate the performance of Whisper on Dutch dysarthric speech

Feenstra, Lian (2023) Aladdin, Alla dien, Allendien, please evaluate the performance of Whisper on Dutch dysarthric speech. Master thesis, Voice Technology (VT).

Preview

PDF
MA 4957180 L Feenstra.pdf
Download (502kB) | Preview

Abstract

Automatic Speech Recognition (ASR) will take spoken speech and convert this into text. Although this technology is now readily available to most of us, as of now this technology is not usable for everyone. People with dysarthria experience a worse performance in ASR compared people without dysarthria. This is mostly due to the smaller amount of dysarthric data available and the characteristics present in dysarthric speech. The severity of the dysarthria impacts the ASR performance for that speaker. Recently the Whisper language processor was released. The creators of Whisper propagate the processor as a robust system. In this research the weakly supervised encoder-decoder Transducer (Whisper) architecture is compared to a hybrid model in a Dutch dysarthric speech recognition task. The Whisper model is finetuned on dysarthric data from the COPAS corpus and evaluated on the Domotica-3 dataset. The weakly supervised encoder-decoder Transducer did not manage to outperform the hybrid model. Results showed a limited decrease in the WER for speakers with moderate and high severity dysarthria. The speakers with mild severity dysarthria suffered worse performance after finetuning. The weakly supervised encoder-decoder Transducer did not outperform the hybrid ASR model on any of the speaker used in the evaluation. Future research is needed to make ASR systems truly accessible to the demographic of dysarthric speakers. This research could focus on evaluating the Whisper architecture using a different pretraining strategy, using transfer learning or using a different evaluation dataset. Another possibility would be to evaluate the speech of dysarthric speakers and get a clearer view of of which characteristic of the speech of dysarthric speakers seems to degrade the ASR performance.

Item Type:	Thesis (Master)
Name supervisor:	Nayak, S. and Coler, M.L.
Date Deposited:	12 Sep 2023 11:15
Last Modified:	12 Sep 2023 11:15
URI:	https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/377

Actions (login required)

View Item