Lei, Yi (2024) Optimizing Text-to-Speech: Investigating Training Data Volume for Human-Level Synthesis with Fastspeech2. Master thesis, Voice Technology (VT).
|
PDF
MA-5712491-Y-Yi.pdf Download (1MB) | Preview |
Abstract
This study investigates the relationship between training data volume and Text-to-Speech (TTS) sys- tem performance, focusing on the FastSpeech 2 model. I aim to determine the amount of data nec- essary to achieve human-level speech synthesis. Hypothesizing that Mean Opinion Scores (MOS) increase with data augmentation until reaching a human-level threshold, I conduct experiments with varying data volumes. Participants then subjectively rate synthesized speech samples alongside nat- ural speech. The research aims to advance TTS technology by providing insights into the critical role of training data volume, particularly in low-resource language settings.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Do, T.P. |
Date Deposited: | 23 Jul 2024 07:07 |
Last Modified: | 23 Jul 2024 07:07 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/538 |
Actions (login required)
View Item |