Su, Cantao (2024) Enhancing English Dysarthric Speech Recognition with Age-Matched Healthy Speech: A Fine-Tuning Approach Using wav2vec 2.0. Master thesis, Voice Technology (VT).
|
PDF
MSc-S4802829-CS-Su.pdf Download (1MB) | Preview |
Abstract
Automatic Speech Recognition (ASR) has made significant advancements since its advent, partic- ularly in recent years. However, ASR for dysarthric speech remains a substantial challenge due to its high variability and the limited labelled data for training. This thesis focuses on the fine-tuning phase of the wav2vec 2.0 model, which is pre-trained on large-scale English datasets, aiming to im- prove the recognition accuracy of dysarthric speech. Specifically, this study investigates the impact of incorporating age-matched healthy speech during the fine-tuning process. Utilising the TORGO dataset, which includes dysarthric speech from speakers with cerebral palsy (CP) and amyotrophic lateral sclerosis (ALS) alongside non-dysarthric controls, this thesis evaluates the performance of ASR models fine-tuned with and without age-matched healthy speech. The methodology involves comparing models fine-tuned with dysarthric speech alone, dysarthric speech combined with age-matched healthy speech, and dysarthric speech combined with age-unmatched healthy speech. In addition to speaker-independent settings, this study also expands to speaker-dependent scenar- ios by fine-tuning and validating models on speech data from individual dysarthric speakers with varying levels of intelligibility. This approach provides a comprehensive evaluation of the models’ performance across different severity levels of dysarthria. The results of this research provide practical insights into the effectiveness of incorporating age- matched healthy speech data in training robust ASR models for dysarthric speech. By leveraging the strengths of wav2vec 2.0 and utilising age-matched data, this work aims to contribute to the development of more accurate and reliable ASR systems for individuals with speech impairments. Ultimately, this research seeks to improve the accessibility of voice technology and communication for affected populations.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Verkhodanova, V. |
Date Deposited: | 05 Aug 2024 08:44 |
Last Modified: | 05 Aug 2024 08:44 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/543 |
Actions (login required)
View Item |