Javascript must be enabled for the correct page display

From Zero-Shot to Fine-Tuned: Linguistic Error Analysis in Frisian ASR with Whisper

LI, Xinchi (2025) From Zero-Shot to Fine-Tuned: Linguistic Error Analysis in Frisian ASR with Whisper. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MSCS5853532XCLi.pdf

Download (869kB) | Preview

Abstract

Frisian is a low-resource language that shares close linguistic ties with Dutch, German, and English. Automatic Speech Recognition (ASR) projects for Frisian have long faced challenges such as limited availability of speech and transcription data, as well as low model accuracy. This study investigates how to effectively model Frisian using Whisper(small), a multilingual pre-trained model, through cross-lingual transfer learning. This approach leverages Whisper’s built-in multilingual tokenizer, eliminating the need for Frisian-specific preprocessing Additionally, we analyze the causes of recog- nition errors from a linguistic perspective after cross-lingual adaptation. We selected the Dutch, German, and English configurations of the Whisper model and conducted both zero-shot testing and fine-tuning experiments. The results show that, without fine-tuning, the Word Error Rates (WER) of the models were: Dutch – 90.84%, German – 104.052%, and English – 111.954%. After fine-tuning on Frisian data, the WERs significantly decreased to: Dutch – 5.745%, German – 5.877%, and English – 5.741%. These findings prove the strong potential of cross-lingual transfer learning in Frisian ASR, especially when the source and target languages are closely related and structurally similar. High recognition accuracy was achieved without the need for additional language models or customized tokenizers. Linguistic analysis of the ASR errors revealed common issues such as language transfer effects, grammatical marker confusion, and phonetic similarity confusions. This study confirms the feasi- bility and efficiency of using multilingual pre-trained models for transfer learning in low-resource languages and provides insights into error types and future directions for low-resource ASR system development. keywords:Cross-lingual Transfer Learning,Frisian Speech Recognition,Linguistic Error Analysis

Item Type: Thesis (Master)
Name supervisor: Coler, M.L.
Date Deposited: 08 Jul 2025 10:43
Last Modified: 08 Jul 2025 10:43
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/689

Actions (login required)

View Item View Item