Javascript must be enabled for the correct page display

Optimizing Text-to-Speech: Investigating Training Data Volume for Human-Level Synthesis with Fastspeech2

Lei, Yi (2024) Optimizing Text-to-Speech: Investigating Training Data Volume for Human-Level Synthesis with Fastspeech2. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MA-5712491-Y-Yi.pdf

Download (1MB) | Preview

Abstract

This study investigates the relationship between training data volume and Text-to-Speech (TTS) sys- tem performance, focusing on the FastSpeech 2 model. I aim to determine the amount of data nec- essary to achieve human-level speech synthesis. Hypothesizing that Mean Opinion Scores (MOS) increase with data augmentation until reaching a human-level threshold, I conduct experiments with varying data volumes. Participants then subjectively rate synthesized speech samples alongside nat- ural speech. The research aims to advance TTS technology by providing insights into the critical role of training data volume, particularly in low-resource language settings.

Item Type: Thesis (Master)
Name supervisor: Do, T.P.
Date Deposited: 23 Jul 2024 07:07
Last Modified: 23 Jul 2024 07:07
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/538

Actions (login required)

View Item View Item