Cai, Youyang (2024) Multimodal sarcasm recognition based on different feature fusion methods. Master thesis, Voice Technology (VT).
|
PDF
MA-5746019-Y-Cai.pdf Download (1MB) | Preview |
Abstract
In the current digital age where virtual assistants are widespread and sarcasm is frequently used on- line, the detection of sarcasm has become a critical challenge. Traditional sarcasm detection methods have largely focused on textual data, but the emergence of multimodal analysis has introduced new dynamics into the field, reflecting the complex nature of human communication where sarcasm often involves discrepancies across different modalities. For example, a sarcastic remark might be made with a positive tone but accompanied by a sarcastic facial expression. This thesis introduces the Contrastive-Attention-based (ConAtt) model, designed to enhance sarcasm detection by leveraging a cross-modal contrastive attention mechanism. This model effectively cap- tures and analyzes inconsistencies between modalities—such as textual praise accompanied by a com- plaining tone—by extracting several contrastive features from the discourse. Experimental validations on the Multimodal Sarcasm Detection Dataset (MUStARD) dataset demonstrate the ConAtt model’s effectiveness, marking a significant advancement over traditional sarcasm detection approaches. The ConAtt model exhibits substantial improvements in key performance metrics including Preci- sion, Recall and F-Score, highlighting the benefits of integrating multimodal data for detecting sar- casm. This study not only emphasizes the importance of multimodal integration but also validates the efficacy of the contrastive attention mechanism in parsing complex and subtle cues across various communication channels. The capabilities of the ConAtt model are pivotal for the accurate iden- tification and interpretation of sarcasm, offering substantial potential for applications that require a nuanced understanding of human interactions. These findings lay a robust foundation for future research in multimodal sarcasm detection, opening new avenues for exploring more intricate and nuanced forms of communication. This work under- scores the growing relevance of advanced multimodal analysis techniques in the broader context of natural language processing and human-computer interaction.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Gao, X. |
Date Deposited: | 12 Jul 2024 07:18 |
Last Modified: | 12 Jul 2024 07:18 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/511 |
Actions (login required)
View Item |