Javascript must be enabled for the correct page display

Multimodal Sarcasm Detection Using BERT, TimesFormer, and Wav2Vec 2.0 with MUStARD++

Shi, Erin (2024) Multimodal Sarcasm Detection Using BERT, TimesFormer, and Wav2Vec 2.0 with MUStARD++. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MSc-S5497094-Y-Shi.pdf

Download (1MB) | Preview

Abstract

Sarcasm detection in speech has faced significant challenges due to its inherent reliance on conversational cues and tonal subtleties. This thesis explores the enhancement of sarcasm detection by incorporating multimodal data, specifically textual, audio, and visual information, using an extended BERT (Bidirectional Encoder Representations from Transformers) model fine-tuned on the MUStARD++ dataset. This research adopts an early fusion approach, where features from these diverse modalities are integrated at the initial stages of the processing pipeline. Early fusion involves the combination of all features from each modality, typically through concatenation, before forwarding them to the model for training. To enhance the model's capabilities, TimesFormer was employed for video data and Wav2Vec2 for audio data. This method hypothesizes that a multimodal approach can capture the nuanced expressions of sarcasm more effectively than single-modal approaches. The results are evaluated on several metrics including precision, recall, and F1-score to demonstrate its efficacy. The findings indicate that the multimodal approach significantly enhances the model’s ability to detect sarcasm, particularly in complex scenarios where unimodal models struggle. The integration of multimodal data not only enriches the feature set but also aligns with the sarcasm perception process by humans, which integrates not only literal words but also paralinguistic cues (i.e., facial expressions, prosody). The findings from this study suggest potential for further exploration, such as improving real-time sarcasm detection in conversational AI, enhancing sentiment analysis in social media monitoring tools, and developing more advanced virtual assistants capable of understanding nuanced human emotions.

Item Type: Thesis (Master)
Name supervisor: Coler, M.L.
Date Deposited: 09 Jul 2024 06:44
Last Modified: 09 Jul 2024 06:44
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/502

Actions (login required)

View Item View Item