Ding, Shenghuan (2024) Comparative Study of Low Resource Language Manchu Speech Synthesis: Transfer Learning from Spanish vs. Mandarin Chinese. Master thesis, Voice Technology (VT).
|
PDF
MA-5743346-S-Shenghuan-1.pdf Download (416kB) | Preview |
Abstract
This study aims to explore the effect of transfer learning from Spanish and Mandarin Chinese in Manchu speech synthesis and determine which language can achieve better synthesis results. We experimentally compare the Manchu speech synthesis effect of transfer learning from Spanish and Mandarin Chinese and analyze the impact of speech features between different languages on the synthesis results. Our hypothesis is that since Mandarin Chinese has more loanwords and possible phoneme similarities, transfer learning from Mandarin Chinese will achieve better results than Spanish. To verify the hypothesis, we first collected speech data from Spanish and Mandarin Chinese and used them to build a speech synthesis system based on the FastSpeech 2 model. Then, we used Montreal Forced Aligner (MFA) to align speech and text to ensure the consistency of training data. Then, we used transfer learning methods to apply the trained Spanish and Mandarin Chinese models to Manchu speech synthesis. Finally, we evaluated the synthesis effect of transfer learning from different languages and analyzed its accuracy and naturalness. The experimental results show that the Manchu speech synthesis effect of transfer learning from Mandarin Chinese is better than that of Spanish. This suggests that the language features and phoneme similarity between Mandarin Chinese and Manchu play a key role in the synthesis effect. In addition, we also found that despite the difference in the gender of the voices between the Mandarin Chinese and Spanish recordings (female for Mandarin Chinese and male for Spanish), this variation did not significantly impact the synthesis results. The results of this study support our hypothesis that transfer learning using Mandarin Chinese will produce better Manchu speech synthesis results. This finding is of great significance for improving the quality and efficiency of speech synthesis for low resource languages and provides a useful reference for future related research.
Item Type: | Thesis (Master) |
---|---|
Date Deposited: | 13 Jun 2024 07:46 |
Last Modified: | 13 Jun 2024 07:46 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/471 |
Actions (login required)
View Item |