Layer-wise Cross-Lingual Depression Detection from Speech: A HuBERT-Based Study on English and Mandarin

Chen, Hang (2025) Layer-wise Cross-Lingual Depression Detection from Speech: A HuBERT-Based Study on English and Mandarin. Master thesis, Voice Technology (VT).

Preview

PDF
MA-5944562-H-Chen.pdf
Download (965kB) | Preview

Abstract

Depression is a global mental health challenge, and early detection remains a critical goal. While recent studies have applied self-supervised learning (SSL) models to speech-based depression detection, most have focused on monolingual English data. This study investigates whether acoustic cues related to depression generalize across languages by developing a cross-lingual detection framework based on HuBERT. Building on previous findings that HuBERT’s middle layers perform well on emotion recognition across languages, this study explores which layers encode depression-relevant representations well. Using binary classification (depressed vs. non-depressed), HuBERT was fine-tuned separately on English (DAIC-WOZ) and Mandarin (CMDC) datasets. Model performance was evaluated on English, Mandarin, and mixed-language speech segments to assess the potential for cross-lingual transfer. Results suggest that middle layers yield higher generalization and that mixed-language training improves transferability across distinct languages. These findings contribute to research on speech-based mental health detection and indicate that cross-lingual modeling may be effective for different language scenarios. Future work will explore how HuBERT layers encode clinical versus self-reported labels and their impact on generalization.

Item Type:	Thesis (Master)
Name supervisor:	Verkhodanova, V.
Date Deposited:	28 Jul 2025 07:05
Last Modified:	28 Jul 2025 07:05
URI:	https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/738

Actions (login required)

View Item