Miao, Haolin (2025) Enhancing Whisper's Zero-Shot Capabilities for Code-Switching through Fine-Tuning. Master thesis, Voice Technology (VT).
|
PDF
Master-ThesisHaolinMiao.pdf Download (2MB) | Preview |
Abstract
Automatic Speech Recognition (ASR) for code-switched speech, particularly involving dialects like Cantonese mixed with English, remains a significant challenge for pre-trained models. This study investigates the efficacy of fine-tuning as a domain adaptation strategy for OpenAI's Whisper models on this task. A comparative analysis was conducted by fine-tuning two distinct models, whisper-small and whisper-large-v3, on a Cantonese-English code-switching dataset and evaluating their performance against their respective zero-shot baselines. The experiments were performed on the MCE dataset, with Word Error Rate (WER) and Character Error Rate (CER) as the primary evaluation metrics. The results demonstrate that fine-tuning yields substantial performance improvements for both models. The whisper-small model, in particular, showed a reduction in error rates, achieving a significant drop in WER and an improvement in CER. Furthermore, this study reveals a relationship between model scale and task-specific performance. While the whisper-large-v3 model also improved upon its zero-shot baseline, its final word-level accuracy did not surpass that of the fine-tuned small model. This outcome suggests that the larger model was prone to overfitting on the medium-resource dataset, learning surface-level patterns without generalizing effectively. The conclusion is that for specialized ASR tasks such as Cantonese-English code-switching, a smaller, more constrained model can offer a more effective pathway to achieving robust performance, highlighting the critical trade-off between model capacity and generalization. Keywords: Automatic Speech Recognition (ASR), Code-Switching, Whisper Model, Fine-Tuning
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Schauble, J.K. |
Date Deposited: | 19 Sep 2025 13:54 |
Last Modified: | 19 Sep 2025 13:54 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/665 |
Actions (login required)
![]() |
View Item |