Javascript must be enabled for the correct page display

Enhancing Whisper's Zero-Shot Capabilities for Code-Switching through Fine-Tuning

Miao, Haolin (2025) Enhancing Whisper's Zero-Shot Capabilities for Code-Switching through Fine-Tuning. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
Master-ThesisHaolinMiao.pdf

Download (2MB) | Preview

Abstract

Automatic Speech Recognition (ASR) for code-switched speech, particularly involving dialects like Cantonese mixed with English, remains a significant challenge for pre-trained models. This study investigates the efficacy of fine-tuning as a domain adaptation strategy for OpenAI's Whisper models on this task. A comparative analysis was conducted by fine-tuning two distinct models, whisper-small and whisper-large-v3, on a Cantonese-English code-switching dataset and evaluating their performance against their respective zero-shot baselines. The experiments were performed on the MCE dataset, with Word Error Rate (WER) and Character Error Rate (CER) as the primary evaluation metrics. The results demonstrate that fine-tuning yields substantial performance improvements for both models. The whisper-small model, in particular, showed a reduction in error rates, achieving a significant drop in WER and an improvement in CER. Furthermore, this study reveals a relationship between model scale and task-specific performance. While the whisper-large-v3 model also improved upon its zero-shot baseline, its final word-level accuracy did not surpass that of the fine-tuned small model. This outcome suggests that the larger model was prone to overfitting on the medium-resource dataset, learning surface-level patterns without generalizing effectively. The conclusion is that for specialized ASR tasks such as Cantonese-English code-switching, a smaller, more constrained model can offer a more effective pathway to achieving robust performance, highlighting the critical trade-off between model capacity and generalization. Keywords: Automatic Speech Recognition (ASR), Code-Switching, Whisper Model, Fine-Tuning

Item Type: Thesis (Master)
Name supervisor: Schauble, J.K.
Date Deposited: 19 Sep 2025 13:54
Last Modified: 19 Sep 2025 13:54
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/665

Actions (login required)

View Item View Item