Huang, Qiyan (2025) Towards Fine-Grained Emotional Modulation in FastSpeech 2 with Hierarchical Emotion Distributions. Master thesis, Voice Technology (VT).
|
PDF
MAS5858895QHuang.pdf Download (1MB) | Preview |
Abstract
Emotional speech synthesis has made substantial progress; however, interpretable and fine-grained prosody control,remains a persistent challenge. Existing systems often rely on global emotion labels or latent style embeddings, which limits precise temporal manipulation of emotional expression. This thesis introduces a novel approach to emotional prosody control by integrating phoneme-aligned Hierarchical Emotion Distributions (HED) into the non-autoregressive FastSpeech 2 architecture. The method enables interpretable emotion conditioning through injecting 12-dimensional HED vectors after the variance adaptor, supported by a gradual training strategy for stable convergence. Experiments, conducted using the English subset of the Emotional Speech Dataset (ESD), employed multiple evaluation settings. These included sentence- and phoneme-level acoustic analysis, inference-time intensity manipulation, and perceptual testing via Best-Worst Scaling (BWS). Models were compared across emotion categories and training stages to assess control effectiveness and robustness. Results demonstrate that HED conditioning yields consistent, emotion-specific prosodic patterns with clearly distinguishable pitch and energy trajectories. Furthermore, inference-time manipulation of HED vectors results in predictable changes in emotional intensity, confirming the proposed system’s controllability. Subjective ratings align with acoustic findings, showing listener preference for HED-guided outputs. This research contributes a structured and interpretable framework for emotional speech synthesis, advancing the controllability of non-autoregressive TTS. This work supports future applications in expressive voice technologies, virtual agents, and human-computer interaction.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Verkhodanova, V. |
Date Deposited: | 16 Jun 2025 11:32 |
Last Modified: | 16 Jun 2025 11:32 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/666 |
Actions (login required)
![]() |
View Item |