Javascript must be enabled for the correct page display

Enhanced Multimodal Emotion Recognition using GRU and Self-Attention Mechanisms: Techniques and Applications

Shi, Jingwen (2024) Enhanced Multimodal Emotion Recognition using GRU and Self-Attention Mechanisms: Techniques and Applications. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MA-5718902-J-Shi.pdf

Download (5MB) | Preview

Abstract

This thesis makes substantial contributions to the field of multimodal emotion recognition by developing and evaluating models that integrate audio, visual, and textual data. We utilized state-of-the-art feature extraction techniques, including BERT for text, LibROSA for audio, and OpenFace for visual cues, achieving a comprehensive representation of multimodal data. A novel temporal alignment technique was introduced to synchronize features across modalities, ensuring coherent integration and enhancing the model's ability to capture intricate relationships between different modalities. The proposed model architecture combines Gated Recurrent Units (GRUs) and self-attention mechanisms, effectively capturing both local and global dependencies, significantly improving feature extraction and emotion recognition accuracy. A stacking fusion module was implemented to amalgamate information from text, audio, and visual modalities, leading to superior performance metrics across multiple datasets, including CMU-MOSI, CMU-MOSEI, and CH-SIMS. Extensive evaluation demonstrated substantial improvements over baseline models, validating the effectiveness of the proposed methods in achieving higher accuracy and robustness in emotion recognition. Our research has significant practical implications, setting a new benchmark for emotion recognition. The developed system enhances human-computer interactions, provides multilingual support in virtual assistants, and assists language learners, thereby contributing to the preservation of linguistic diversity and cultural heritage. Additionally, this work contributes to the development of socially intelligent and empathetic artificial systems, paving the way for more advanced applications in affective computing. In conclusion, this thesis advances the field of multimodal emotion recognition through innovative methods and comprehensive evaluation. The findings underscore the importance of integrating multiple data modalities and provide a solid foundation for future research and practical applications, offering pathways for continued innovation in recognizing and understanding human emotions.

Item Type: Thesis (Master)
Name supervisor: Nayak, S.
Date Deposited: 22 Jul 2024 07:25
Last Modified: 22 Jul 2024 07:25
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/535

Actions (login required)

View Item View Item