Javascript must be enabled for the correct page display

Evaluation of wav2vec 2.0 Speech Recognition for the Elderly Frisian Population

Shekoufandeh, Golshid (2023) Evaluation of wav2vec 2.0 Speech Recognition for the Elderly Frisian Population. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
Golshid_Thesis.pdf

Download (1MB) | Preview

Abstract

Automatic Speech Recognition (ASR) converts speech into text. It has become cru- cial in daily life, as evident through the utility of virtual assistants like Alexa and Siri and other tools that help people. Most publicly available ASR models are de- signed for the English language. Only a few support Frisian and under-resourced Germanic language. Moreover, none of these models are tailored explicitly for el- derly speakers. The lack of adequate ASR resources for the Frisian language poses an intersectional disadvantage for elderly speakers, resulting in significant challenges in developing technologies to address the needs of this community. To address this gap, increasing the availability of training data is necessary. In this study, I propose using data augmentation techniques to augment elderly audio recordings. These aug- mented datasets will be used to train the wav2vec 2.0 XLS-R model, which has shown promise in Frisian ASR. My co-developed model, fine-tuned from the Facebook XLS- R Wav2Vec2 model, achieved a word error rate (WER) of 15.35% when trained on the Common Voice dataset. The main objective of this research is to investigate the effect of fine-tuning the model using augmented elderly speech data tailored explicitly for Frisian elderly speakers. By integrating this dataset, I expanded the collection of recorded speeches from elderly Frisian individuals, leading to a remarkable 20% improvement in relative WER for Frisian elderly ASR. This study makes a valu- able contribution towards tackling the technological hurdles encountered by the local Frisian community. Furthermore, it emphasizes the significance of advancing ASR technologies for languages with limited resources and specific demographic groups. Apart from addressing the research objectives, this study offers essential contextual information, underscores the study’s importance, and recognizes the broader implica- tions for ASR research in low-resource languages and elderly ASR.

Item Type: Thesis (Master)
Name supervisor: Coler, M.L.
Date Deposited: 12 Sep 2023 10:55
Last Modified: 12 Sep 2023 10:55
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/347

Actions (login required)

View Item View Item