Ouyang, Yanpei (2024) Assessing Knowledge-Distillation Based Compression of Whisper Model for Frisian ASR. Master thesis, Voice Technology (VT).
|
PDF
YanepiOuyangAssessing-Knowledge-Distillation-Based-Compression-of-Whisper-Model-for-Frisian-ASR.pdf Download (2MB) | Preview |
Abstract
Multilingual ASR systems face challenges in accommodating diverse linguistic landscapes, particularly for low-resource languages (LRLs) with limited data. This study investigates the efficacy of model compression techniques, specifically knowledge distillation (KD) and fine-tuning, in enhancing the performance and efficiency of the Whisper-small model for LRLs. The research aims to determine whether applying KD and fine-tuning to the Whisper-small model can improve its performance on LRLs while reducing its computational and memory requirements. Fine-tuning experiments were conducted on both the English (LibriSpeech) and Frisian (CommonVoice 6.1) datasets for both the original Whisper-small model and the distilled Whisper-small model. Subsequently, a comprehensive evaluation based on various metrics, including Word Error Rate (WER), number of model parameters, and training set sizes, was performed. The results demonstrate that the distilled Whisper-small model achieved a WER of 26.91% when fine-tuned with 10 hours of Frisian data, exceeding the initial reduction target. In comparison, the Whisper-small model achieved a WER of 22.42% under the same conditions. Additionally, the distilled model showed competitive performance with limited training data, highlighting the potential of KD to create efficient ASR models suitable for environments with constrained computational resources and data availability. Furthermore, while the Whisper-small model supports recognition of many languages, including Dutch, it was successfully fine-tuned to recognize Frisian, a language it originally did not support. Similarly, the Distil-Whisper-small model, which initially only supported English, was also successfully adapted to recognize Frisian, showcasing the adaptability of these models for cross-linguistic applications. In conclusion, the findings validate the effectiveness of model compression techniques, particularly KD, in enhancing the performance and efficiency of ASR models for LRLs. This study contributes to the development of more efficient and inclusive multilingual ASR systems, providing valuable insights into optimizing ASR models for diverse linguistic landscapes, especially those with limited datasets. The implications of this research extend to various domains, including education, healthcare, and accessibility, ultimately advancing universal accessibility and real-world applications of ASR technology. Keywords: Automatic Speech Recognition (ASR), fine-tuning, knowledge distillation (KD), Whispersmall Model, Frisian Language.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Nayak, S. |
Date Deposited: | 22 Jul 2024 11:17 |
Last Modified: | 22 Jul 2024 11:17 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/537 |
Actions (login required)
View Item |