Fine-Tuning Whisper for Dutch-Speaking Autistic Children: Adapting ASR to Atypical Speech in Low-Resource Settings

Yu, Hantao (2025) Fine-Tuning Whisper for Dutch-Speaking Autistic Children: Adapting ASR to Atypical Speech in Low-Resource Settings. Master thesis, Voice Technology (VT).

Preview

PDF
MScS5910587HYYu.pdf
Download (3MB) | Preview

Abstract

Children with Autism Spectrum Disorder (ASD) often exhibit atypical prosody and disfluency patterns, posing challenges for automatic speech recognition (ASR) systems. While large-scale models like Whisper have achieved strong general performance, their effectiveness on neurodivergent speech in low-resource languages remains underexplored. This study focuses on Dutch, a relatively underrepresented language in ASD ASR research, and investigates how task-specific fine-tuning of the Whisper-medium model can improve recognition of Dutch speech from autistic children. The main experiment involves baseline fine-tuning across seven speaker group combinations (TD (typical developing children), ADHD, ASD, and their mixes). And the study is complemented by exploratory experiments using parameter-efficient LoRA fine-tuning. Results show that fine-tuning significantly improves recognition performance, particularly when ASD speech is included in training. The best baseline configuration (TD+ASD+ADHD) reduced Word Error Rate (WER) from 43.12% (zero-shot) to 26.43%, while LoRA fine-tuning with ASD-only data further reduced WER to 23.20%, underscoring the impact of prosody-aligned training even under low-resource constraints. Error analysis revealed reductions in deletion and substitution errors, and better recognition of disfluencies such as fillers and repetitions. Statistical tests (e.g., Mann-Whitney U) confirmed the significance of performance differences across training conditions (p < 0.05), favoring ASD-inclusive models. These findings emphasize the importance of prosodic alignment and domain relevance in adapting ASR systems for neurodivergent speakers. This work contributes both methodologically, by comparing full and parameter-efficient fine-tuning strategies, and practically, by advancing inclusive speech recognition solutions in low-resource, underserved populations. Keywords: Speech Recognition, Whisper, Fine-Tuning, Atypical Speech, Autism Spectrum Disorder (ASD), Child Speech, Prosody, Disfluency

Item Type:	Thesis (Master)
Name supervisor:	Gao, X.
Date Deposited:	16 Jun 2025 11:04
Last Modified:	16 Jun 2025 11:04
URI:	https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/658

Actions (login required)

View Item