Javascript must be enabled for the correct page display

Layer-wise Cross-Lingual Depression Detection from Speech: A HuBERT-Based Study on English and Mandarin

Chen, Hang (2025) Layer-wise Cross-Lingual Depression Detection from Speech: A HuBERT-Based Study on English and Mandarin. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MA-5944562-H-Chen.pdf

Download (965kB) | Preview

Abstract

Depression is a global mental health challenge, and early detection remains a critical goal. While recent studies have applied self-supervised learning (SSL) models to speech-based depression detection, most have focused on monolingual English data. This study investigates whether acoustic cues related to depression generalize across languages by developing a cross-lingual detection framework based on HuBERT. Building on previous findings that HuBERT’s middle layers perform well on emotion recognition across languages, this study explores which layers encode depression-relevant representations well. Using binary classification (depressed vs. non-depressed), HuBERT was fine-tuned separately on English (DAIC-WOZ) and Mandarin (CMDC) datasets. Model performance was evaluated on English, Mandarin, and mixed-language speech segments to assess the potential for cross-lingual transfer. Results suggest that middle layers yield higher generalization and that mixed-language training improves transferability across distinct languages. These findings contribute to research on speech-based mental health detection and indicate that cross-lingual modeling may be effective for different language scenarios. Future work will explore how HuBERT layers encode clinical versus self-reported labels and their impact on generalization.

Item Type: Thesis (Master)
Name supervisor: Verkhodanova, V.
Date Deposited: 28 Jul 2025 07:05
Last Modified: 28 Jul 2025 07:05
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/738

Actions (login required)

View Item View Item