Galarneau, Jocomin (2024) Synthesizing Anger: Enhancing Emotional Speech from Text in Novel Dialogues. Master thesis, Voice Technology (VT).
|
PDF
MA-3291561-J-Galarneau.pdf Download (660kB) | Preview |
Abstract
This thesis aims to enhance the synthesis of emotional speech for text, focusing on anger as portrayed in novel dialogues. Building upon advancements in emotion recognition and speech synthesis technologies, the research maps textual emotion descriptors to dimensional parameters of arousal and valence to authentically synthesis the nuances of anger. Using Ekman and Cordaro (2011) basic emotions, Russell (1980) circumplex model, and the Expressive-FastSpeech2 speech synthesis model (K. Lee, 2021), the study intends to generate speech that faithfully represents varying levels of anger, based on novel dialogue contexts and sentiments. In an experimental setup, 12 anger-labeled and 8 neutral-labeled lines from ”Alice inWonderland” (Carroll, 2006) were synthesized into groups with different intensity levels. Twenty participants were then asked to judge up to two emotions and their intensities within the synthesized samples on a scale from 1 to 7. Results indicate a 55.25% participant recognition rate for the intended emotion within the audio sample, with a 32.65% rate for intensity levels, considering scores that fell within one value of the correct level. While this thesis does not conclude that the current methodology consistently creates authentic nuanced emotions, it reaffirms the importance of textual context for novel dialogue, highlights the variability in individual emotional perceptions, and provides a groundwork for future studies to refine and build upon.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Nayak, S. |
Date Deposited: | 18 Jul 2024 08:42 |
Last Modified: | 18 Jul 2024 08:42 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/532 |
Actions (login required)
View Item |