Chen, Hang (2025) Layer-wise Cross-Lingual Depression Detection from Speech: A HuBERT-Based Study on English and Mandarin. Master thesis, Voice Technology (VT).
|
PDF
MA-5944562-H-Chen.pdf Download (965kB) | Preview |
Abstract
Depression is a global mental health challenge, and early detection remains a critical goal. While recent studies have applied self-supervised learning (SSL) models to speech-based depression detection, most have focused on monolingual English data. This study investigates whether acoustic cues related to depression generalize across languages by developing a cross-lingual detection framework based on HuBERT. Building on previous findings that HuBERT’s middle layers perform well on emotion recognition across languages, this study explores which layers encode depression-relevant representations well. Using binary classification (depressed vs. non-depressed), HuBERT was fine-tuned separately on English (DAIC-WOZ) and Mandarin (CMDC) datasets. Model performance was evaluated on English, Mandarin, and mixed-language speech segments to assess the potential for cross-lingual transfer. Results suggest that middle layers yield higher generalization and that mixed-language training improves transferability across distinct languages. These findings contribute to research on speech-based mental health detection and indicate that cross-lingual modeling may be effective for different language scenarios. Future work will explore how HuBERT layers encode clinical versus self-reported labels and their impact on generalization.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Verkhodanova, V. |
Date Deposited: | 28 Jul 2025 07:05 |
Last Modified: | 28 Jul 2025 07:05 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/738 |
Actions (login required)
![]() |
View Item |