Dutch Dysarthric Speech Recognition: Applying Self-Supervised Learning to Overcome the Data Scarcity Issue

Matsushima, Tatsunari (2022) Dutch Dysarthric Speech Recognition: Applying Self-Supervised Learning to Overcome the Data Scarcity Issue. Master thesis, Voice Technology (VT).

Preview

PDF
MSc 4869214 M Matsushima.pdf
Download (1MB) | Preview

Abstract

Automatic speech recognition (ASR) has been successfully used for many applications. However, the development of ASR for dysarthric speech, a common pathological disordered speech, has been hindered due to the lack of training data. Since supervised learning is a data-hungry approach that demands expensive manual annotations, it is not optimal to develop ASR for dysarthric speech. Motivated by successful applications of self-supervised learning (SSL) in ASR for low-resource languages, which have a similar condition of the data limitation, the research applies SSL for Dutch dysarthric speech recognition for the first time. The state-of-the-art model, wav2vec 2.0, and XLSR-53, a cross-lingual model of wav2vec 2.0, are used for benchmarking. The results show that the SSL models achieved poorer performance than the supervised DNN-HMM model. However, the author observed the SSL model's superiority in the generalization ability among different severity groups and patients. Since the dysarthric speech features significantly differ depending on the severity, type of disorder, and speaker characteristics, it is assumed that the generalization ability potentially degrades the SSL model's performance. Hence, the research further develops the speaker-dependent ASR for dysarthric speech. The results show that only roughly 10 minutes of re-fine-tuning with the target speaker's utterances significantly improves the models' performance, achieving 10.79 WER at the highest. It demonstrates how speaker-dependent SSL can eliminate the data limitation constraint in developing dysarthric speech recognition. This is an imperative milestone to developing a working-level Dutch dysarthric speech recognition. The author summarizes the outcome as an SSL training strategy framework for dysarthric speech recognition to catalyze future research.

Item Type:	Thesis (Master)
Name supervisor:	Nayak, S. and Coler, M.L.
Date Deposited:	09 Sep 2022 08:55
Last Modified:	09 Sep 2022 08:55
URI:	https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/211

Actions (login required)

View Item