Weggeman, Sjors (2023) The relevance of using authentic laughter data in natural laughter synthesis: A case study on LaughNet. Master thesis, Voice Technology (VT).
|
PDF
MA 5007453 S Weggeman.pdf Download (1MB) | Preview |
Abstract
The purpose of this research was to enhance the naturalness of synthesised speech by incorporating authentic laughter data into the laughter synthesis process of the state-of-the-art model LaughNet (Luong & Yamagishi, 2021). Firstly, I confirmed that acted and spontaneous human laughter are significantly different on the acoustic level by training a Support Vector Machine (SVM) on their acoustic features. Then I determined the most relevant acoustic features for the authenticity of laughter using factor analysis, resulting in the 1) F0 mean, max., and var., 2) % unvoiced segments and intensity, and 3) F0 min. Out of these factors, only the second is captured in LaughNet's data format, meaning that the most important, lower-level acoustic features have to be generated by the model. In theory, this means that the authenticity of LaughNet's training- and finetuning data should hardly affect the naturalness of the synthetic laughter. However, due to insufficient authentic laughter this could not be confirmed. Future research with sufficient lab-collected data may be able to overcome this limitation by carefully selecting the generative model, data format, and training- and finetuning data. Lastly, I checked whether human listeners were able to detect the authenticity of human laughter significantly above chance level. The perceived authenticity of isolated laughter appear to be contentious, suggesting the need for context as a way to disambiguate the authenticity judgments.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Coler, M.L. and Nayak, S. |
Date Deposited: | 12 Sep 2023 11:08 |
Last Modified: | 12 Sep 2023 11:08 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/367 |
Actions (login required)
View Item |