Jingsi, Huang (2024) Dutch Speech Restoration Research based on Miipher. Master thesis, Voice Technology (VT).
PDF
MSc-5721415-J-Huang.pdf Restricted to Repository staff only Download (2MB) |
Abstract
This study investigates the adaptability of Miipher, a speech restoration model integrating multiple features and proven effective in dataset enhancement for LibriTTS-R, to degraded Dutch speech signals. The objective is to assess the language dependencies influencing the adaptation of the Miipher model, initially designed for monolingual English datasets, to the task of Dutch speech restoration. The research methodology involves a systematic analysis of all Miipher model modules, focusing on substituting monolingual components with alternatives that are multilingual and state-of-the-art to optimize the model’s performance for Dutch speech restoration. Beyond that, this research also anticipates the broader benefits for further speech restoration efforts in other languages. Clean Dutch datasets are employed as the target, augmented with various background noises and reverberations to simulate degraded Dutch speech conditions. The adapted Miipher model’s performance is evaluated using quantitative metrics, including the Mean Opinion Score (MOS), Word Error Rate (WER), and Character Error Rate (CER). To test the model’s robustness, the clean Common Voice dataset is utilized to assess its ability to convert online downloaded clean datasets to studio quality and its performance against various degradations. Results showed that the adapted model improves the audio quality with consistently higher MOS compared to the respective original audio samples. This validates the hypothesis that the model is capable of enhancing various degradations and converting online-download datasets to studio quality. Two outcomes are achieved: a refined version of the Miipher model tailored for Dutch speech restoration and insights into the broader applicability of speech restoration techniques in a multilingual context. The enhanced adaptability marks a significant advancement in speech processing and holds profound implications for voice-related disabilities and healthcare. The refined model’s capacity to restore degraded multilingual speech signals can greatly benefit individuals with speech impairments, providing them with clearer and more intelligible means of communication.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Nayak, S. |
Date Deposited: | 08 Aug 2024 10:51 |
Last Modified: | 08 Aug 2024 10:51 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/547 |
Actions (login required)
View Item |