Javascript must be enabled for the correct page display

Improving Uyghur Speech Synthesis with Monolingual Transfer Learning: A Comparison of English and Russian Pretraining

Aishan, Oufeire (2025) Improving Uyghur Speech Synthesis with Monolingual Transfer Learning: A Comparison of English and Russian Pretraining. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MA-S5973902-O-Aishan.pdf.pdf

Download (503kB) | Preview

Abstract

In recent years, speech synthesis technology has made significant progress in mainstream languages. However, for low-resource languages such as Uyghur, the development and implementation of speech synthesis systems are significantly challenged by a lack of essential resources, including speech corpora and pronunciation dictionaries, which makes effective modeling particularly difficult. Transfer learning is regarded as an effective way to alleviate this problem, especially in text-to-speech (TTS)systems. Transfer learning can improve the synthesis quality of low-resource languages by leveraging knowledge from high-resource languages. This study focuses on a core issue: whether the choice of the source language, especially its similarity to the target language in terms of language structure, significantly affects the effectiveness of transfer learning in speech synthesis. This issue not only concerns the development path of low-resource language technologies, but also has universal guiding significance for the theoretical framework of cross-language speech modeling. Therefore, it represents a research area of high significance both in theoretical and applied contexts. This study conducted transfer experiments using English and Russian as source languages under the FastSpeech 2 architecture, with Uyghur as the target language. After fine-tuning and subjective evaluation, the results showed that transfer learning significantly improved the naturalness and intelligibility of the synthesized speech. Moreover, the model pretrained on Russian whose linguistic structure is more similar to Uyghur achieved better performance. These findings highlight the crucial role of typological similarity in cross-lingual TTS transfer, offering both a theoretical foundation and practical guidance for the development of low-resource speech synthesis systems.

Item Type: Thesis (Master)
Name supervisor: Do, T.P.
Date Deposited: 05 Nov 2025 10:00
Last Modified: 05 Nov 2025 10:00
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/775

Actions (login required)

View Item View Item