Monen, Janay (2022) Automatic Detection and Severity Estimation for Oral Cancer Speech. Master thesis, Voice Technology (VT).
|
PDF
MA 4840054 JSC Monen.pdf Download (1MB) | Preview |
Abstract
Oral cancer (OC) surgery can cause impaired speech intelligibility by preventing the production of articulatory targets required for sounds such as plosives and alveolar sibilants (Halpern et al., 2020a; Halpern et al., 2022a; Halpern et al., 2022b). Various studies have already investigated OC speech characteristics through phonetic approaches that calculate phoneme error rates from transcription- based intelligibility assessments (Saravanan et al., 2016; Constantinescu et al., 2017), and acoustic approaches that look at formants and vowel space area (Bruijn et al., 2009; Rieger et al., 2010). Few studies, however, have looked into OC speech severity estimation (SE) and distinguishing OC speech from healthy speech through machine learning (ML) (Halpern et al., 2020a). The difference between ML and the phonetic/acoustic approaches is that ML models can automatically learn distinctions based on acoustic features and estimate quantities (e.g., severity scores), something which standard significance testing cannot achieve. Using ML for OC detection could therefore broaden our understanding of OC speech, in particular by showing us which speech features are important. Additionally, SE could assist with the tracking of speech therapy progress post-surgery (Suárez-Cunqueiro et al., 2008). Therefore, we explored OC detection and SE through four ML models: logistic regression (LR), support vector machines (SVMs), multilayer perceptrons (MLPs) and one-dimensional convolutional neural networks (1D-CNNs). Using these models, we investigated whether (1) we can distinguish OC speech from healthy speech and (2) whether SE of OC speech based on acoustics is possible. To avoid unwanted artifacts (Halpern et al., 2020a), we collected a dataset with 6 OC patients > 1 year post-surgery and 5 healthy controls. Additionally, we gathered data for a Dutch adaptation of the Speech Handicap Index (Van den Steen et al., 2011) and used the scores as ground truth for SE. Model performances were evaluated in terms of standard accuracy, area under curve, sensitivity and specificity metrics. Our findings confirm that OC speech detection is possible with models trained on long-term average spectrum (LTAS) features. The best performance on this task was achieved with the 1D-CNN (67.41% accuracy). We also found confirmation for reliable OC speech SE, in particular for the SVM trained on Mel-frequency cepstral coefficient (MFCC) features (68.73% accuracy). These outcomes suggest that model performance may depend on factors such as task, feature type and several other factors that we address in our discussion.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Verkhodanova, V |
Date Deposited: | 09 Sep 2022 08:48 |
Last Modified: | 09 Sep 2022 08:48 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/225 |
Actions (login required)
View Item |