Javascript must be enabled for the correct page display

Speech Emotion Recognition via Multimodal CNN-LSTM Architectures

Chen, Yitong (2025) Speech Emotion Recognition via Multimodal CNN-LSTM Architectures. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
mscspeechtechthesisdolores.pdf

Download (1MB) | Preview

Abstract

With the increasing integration of intelligent systems into daily life, emotion recognition has become a key component of affective computing. Sadness, in particular, holds practical significance due to its relevance to mental health screening and emotionally adaptive systems. However, detecting sadness from speech alone remains challenging, as it often manifests through subtle acoustic cues that are difficult to distinguish from neutral affect. This study proposes a multimodal CNN-LSTM framework that integrates audio and textual in puts for binary emotion classification (sad vs. non-sad). The architecture combines convolutional layers to extract local acoustic features, LSTM layers to capture temporal dependencies, and an attention mechanism to focus on emotionally salient segments. It is hypothesized that this multimodal approach will outperform unimodal (audio-only) baselines by at least 10 percent in classification performance. To test this, experiments were conducted on the IEMOCAP dataset using Session 5 as a held out, speaker-independent test set. Models were evaluated using standard metrics, with emphasis on F1 score due to the dataset’s class imbalance. Results indicate that the attention-based multimodal model achieved a relative improvement of over 30 percent in F1 score compared to the best unimodal baseline. These findings suggest that multimodal architectures offer a promising direction for improving sadness detection and may contribute to the development of more context-sensitive, emotion-aware applications in future human-computer interaction systems.

Item Type: Thesis (Master)
Name supervisor: Gao, X.
Date Deposited: 05 Aug 2025 08:55
Last Modified: 05 Aug 2025 08:55
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/746

Actions (login required)

View Item View Item