An Innovative Method for Multi-Effect Speech Synthesis through Training File Modification

Wei, Yilan (2024) An Innovative Method for Multi-Effect Speech Synthesis through Training File Modification. Master thesis, Voice Technology (VT).

Preview

PDF
MSc-s5515939-Y-Wei.pdf
Download (702kB) | Preview

Abstract

Human language naturally and flexibly adjusts speech rate, intonation, and voice intensity during communication. However, such dynamic changes are often inadequately modeled in current speech synthesis research. Most existing studies focus on generating audio with specific emotional tones (e.g., happy, sad, angry), but few address synthesizing audio with varied speech modifications, such as changes in speech speed and pitch adjustments within a single sentence. To address this gap, this study proposes an innovative method for multi-effect speech synthesis using the FastSpeech2 model by precisely modifying the training files and corresponding audio data. Experimental results demonstrate that this approach significantly enhances the model’s ability to reproduce target speech modifications, yielding excellent performance in Chinese, English, and Spanish. Numerical analyses and manual listening assessments validate the model's sensitivity and accuracy to speech rate adjustments. Additionally, the study demonstrates the cross-linguistic generalizability and validity of the method, indicating a wide range of potential applications. This method is expected to contribute to more emotionally expressive and diverse audio synthesis, advancing speech synthesis technology.

Item Type:	Thesis (Master)
Name supervisor:	Coler, M.L. and Nayak, S.
Date Deposited:	06 Aug 2024 08:02
Last Modified:	06 Aug 2024 08:02
URI:	https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/544

Actions (login required)

View Item