Wildenburg, Kirsten (2022) Automatic speech recognition and error analyses of Dutch oral cancer speech. Master thesis, Voice Technology (VT).
|
PDF
Final_thesis_KW.pdf Download (1MB) | Preview |
Abstract
Approximately 500.000 people are diagnosed with oral cancer yearly (Shield et al., 2016), and the treatment of oral cancer often leads to impaired speech intelligibility (Lazarus et al., 2014). Automatic speech recognition (ASR) systems could ameliorate oral cancer survivors’ quality of life since it could ease their communication and it could also be applied clinically (Windrich et al., 2008). Therefore, this study aims to investigate what phonemes cause higher recognition error rates in standard end-toend (E2E) ASR systems for oral cancer speech compared to healthy speech in Dutch, as well as the influence of the surgical treatment on the ASR performance. We use the ESPnet E2E ASR system that adopts a hybrid CTC-attention architecture in combination with a Conformer model that was pre-trained on the CGN corpus containing healthy Dutch speech. After running our Dutch oral cancer speech dataset through the ASR system, we perform an extensive error analysis on both the phoneme and articulatory feature level. In agreement with the literature (e.g. Halpern et al., 2022), our results reveal that the E2E ASR system performs significantly poorer for oral cancer speech than for healthy speech. Especially the production of /k/ elicits higher recognition error rates in oral cancer speech, which is in line with previous research (e.g. Borggreven et al., 2005; de Bruijn et al., 2009). Our articulatory feature analysis supports these findings as it shows that velar consonants are the second most challenging articulatory feature class to be recognized in oral cancer speech, and that plosives are misrecognized most frequently by the ASR system in terms of manner of articulation. Although previous studies report on sibilants being misrecognized in oral cancer speech (e.g. Laaksonen et al., 2011), our results do not show sibilants to be more challenging for the ASR system to capture in oral cancer speech, which is in accordance with the findings of Halpern et al. (2022). In addition, the speech of oral cancer patients who underwent a mandibulectomy seems to obtain higher recognition error rates than the speech of patients who underwent a (partial) glossectomy, although the difference between WERs fails to reach significance. The findings of this study contribute to the development of Dutch ASR systems for oral cancer speech.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Verkhodanova, V. |
Date Deposited: | 09 Sep 2022 08:38 |
Last Modified: | 09 Sep 2022 08:38 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/224 |
Actions (login required)
View Item |