Topological Featurization of Speech Data for Speech Recognition

Laméris, Cárolos (2024) Topological Featurization of Speech Data for Speech Recognition. Master thesis, Voice Technology (VT).

Preview

PDF
MA-3468720-CL-Lameris.pdf
Download (1MB) | Preview

Abstract

Topological data analysis (TDA) encompasses an underlying theory / set of techniques for extracting features relating to the shape of data. Among other things it has been used for featurization of a wide variety of time-series data. The research on the application of TDA to speech is, however, limited. In this thesis, a gentle introduction to TDA for time-series data is given, as well as an interpretation of the TDA featurization of speech in particular. Furthermore the literature on TDA applied to time-series data in general and speech data in particular is reviewed. Three general methods of TDA featurization of speech are selected, from which concrete feature extraction methods are derived using a feature selection procedure based on a phone classification task. These final methods extract features from the following filtered data: (1) the sub/suplevel set filtrations of the mel spectrograms; (2) the sub/suplevel set filtration of the audio signal; (3) the Vietoris-Rips filtration applied to the Takens embedded signal, using a fixed delay / embedding dimension. Using the selected features, various DNN-HMM models are trained using a combination of these derived TDA features and typical features. We find that only using TDA based features for training these models leads to significantly worse performances, however, when combining typical features with TDA features extracted using methods (1)/(2) small improvements in WER are achieved over typical features.

Item Type:	Thesis (Master)
Name supervisor:	Nayak, S.
Date Deposited:	03 Sep 2024 06:51
Last Modified:	03 Sep 2024 06:51
URI:	https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/561

Actions (login required)

View Item