Javascript must be enabled for the correct page display

Age-controllable speech synthesis: A pilot study on English

Vanni, Alice (2024) Age-controllable speech synthesis: A pilot study on English. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MAS5298873AVanni.pdf

Download (2MB) | Preview

Abstract

This research attempts to implement age control in a text-to-speech (TTS) system to allow changing the perceived age of the synthetic voice while keeping the perceived speaker identity. The system uses a non-auto-regressive multi-speaker TTS model, namely FastSpeech2 (Chien et al., 2021) and was inspired by the pipeline outlined for ChildTTS (Jain et al., 2022). It uses Resemblyzer, a pre-trained speaker encoder, and entails an age encoder to extract embedding vectors used to generate speech by children, adults and elderly people. The system is developed for English using a corpus drawn from the Common Voice 17.0 English dataset (Ardila et al., 2020) and the My Science Tutorcorpus (Pradhan et al., 2023). The model’s performance was evaluated by acoustic analysis of the synthetic speech features and the calculation of Mel-Cepstral Distortion. The proposed system is designed to enhance the customisation of Speech Generating Devices (SGDs) and, additionally, to tackle the challenge of developing TTS systems for non-standard voices. The outcome of this research not only contributes to the broader understanding of voice personalisation techniques but also may play a part in providing new insight into the impact of the ageing process on voice. This will positively affect the industry, enabling more efficient creation of tailored voices, e.g. for VoiceAssistants and vocal personas, as well as SGDs users. Age is an integral part of identity, and the ability to recreate a synthesised voice that a person identifies with can be an invaluable tool for those who have lost the ability to speak naturally.

Item Type: Thesis (Master)
Name supervisor: Do, T.P.
Date Deposited: 12 Jun 2024 08:38
Last Modified: 13 Jun 2024 11:43
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/469

Actions (login required)

View Item View Item