Sun, Shiran (2025) A Comparative Evaluation of Closed- and Open-Vocabulary ASR Systems for the Recognition of Dutch Healthcare Terms. Master thesis, Voice Technology (VT).
|
PDF
MA-S5878594S.RSun.pdf Download (3MB) | Preview |
Abstract
Automatic Speech Recognition (ASR) technology is becoming more prevalent in clinical settings, but the performance of closed- and open-vocabulary ASR models on domain-specific speech in healthcare is not well studied. In this paper, we present a comparative evaluation of an ASR systems, operating in closed- and open-vocabulary settings, for the recognition of Dutch clinical terminology. We consider a closed vocabulary Kaldi TDNN model and an open vocabulary Pruned RNN-Transducer (K2-RNN-T), both trained on more than 1000 hours of Dutch speech, consisting of 12 hours domain-specific training data. We evaluate both systems on a professionally transcribed Dutch medical consultation corpus containing over 8000 utterances, using both standard evaluation metrics (WER, CER), domain-specific evaluation metrics (Medical WER and CER), and term-level evaluation (precision, recall, F1 score). We find that in general the closed vocabulary model obtains better recognition results for structured medical terms, such as diseases and drug names: the precision and F1 score is higher while the Medical WER and CER is lower. The open vocabulary model, on the other hand, has better recall and general transcription accuracy and seems more flexible in handling morphologically varied or unknown terms. Evaluation is performed through SNOMED CT and spaCy-based Named Entity Recognition (NER) to extract clinical and contextual entities from the transcription. This study also uncovers notable error types such as phonetic substitutions, semantic approximations, and truncations, each with distinct clinical implications. Results highlight a trade-off between lexical accuracy and adaptability: while the closed-vocabulary model ensures stability for structured content, the open-vocabulary model captures a broader lexical range, including personal and brand names often missed by fixed lexicons. This work shows the strengths of ASR approaches in both closed- and open-vocabulary settings and motivates task-specific optimisation in medical speech applications. The evaluation framework presented here can be adapted for other low-resource languages and specialised domains.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Schauble, J.K. |
Date Deposited: | 16 Jun 2025 11:12 |
Last Modified: | 16 Jun 2025 11:12 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/662 |
Actions (login required)
![]() |
View Item |