Jiang, Weihao (2024) Synthesis of sarcastic speech: Research on adjusting pitch and energy at keyword level using FastSpeech2. Master thesis, Voice Technology (VT).
|
PDF
WeihaoJiang.pdf Download (896kB) | Preview |
Abstract
Sarcasm is one of the most common and important rhetorical techniques in daily life, and the syn thesis of sarcastic speech is also one of a crucial aspect of emotional speech synthesis that warrants attention. While previous research has extensively focused on the detection and recognition of sar casm, there has been less emphasis on the synthesis of sarcastic speech. Therefore, this thesis ex plores how to use the LLM (Large language model), ChatGPT, to predict sarcastic keywords within sentences in the model and synthesizes sarcastic speech by precisely controlling the pitch and energy of these keywords using the FastSpeech2. This research can effectively fill a gap in the field of sar castic speech synthesis. Additionally, an evaluation involving 22 native or second-language English speakers validated the practicality and effectiveness of this method in enhancing the recognition and synthesis of sarcastic tones. Experimental results demonstrated that controlling the acoustic fea tures of keywords alone within a sentence can significantly improve the perception of sarcasm in listeners, compared to global-level pitch and energy control. The results of this experiment can be viewed on the GitHub page: https://weihaohaoao.github.io/weihao.github.io. The specific method of synthesizing sarcastic audio and the modification of the model in this study is open-sourced, which can be found on my github page https://github.com/Weihaohaoao/ Synthesis-sarcastic-voice/tree/main
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Zhu, L. |
Date Deposited: | 13 Aug 2024 08:53 |
Last Modified: | 13 Aug 2024 08:53 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/548 |
Actions (login required)
View Item |