Zhang, Ting (2024) CMGAN-Based Speech Enhancement for Automotive Environments: Targeted Noise Reduction. Master thesis, Voice Technology (VT).
|
PDF
MA-5690145-T-Zhang.pdf Download (5MB) | Preview |
Abstract
This research employs a Conformer Metric Generative Adversarial Network (CMGAN) model, trained on a tailored in-car noisy speech dataset. The methodology incorporates pre-training, hy- brid training, and targeted training to assess the model’s performance in speech enhancement tasks. In total, four experiments were conducted to determine the most effective training strategies for the model. Results from these experimental setups confirm that this targeted training approach signifi- cantly enhances the ASR system’s accuracy and reliability. Particularly when fine-tuned for specific noise conditions, the CMGAN model demonstrated substantial improvements in evaluation metrics, such as the Perceptual Evaluation of Speech Quality (PESQ). Moreover, this study shows that the CMGAN model excels in reducing driving noise but shows less efficacy against street noise and air-conditioner noise. In addition to identifying the most effective training strategies for specific noise datasets, these findings also clarify the relationships between noise types and the effectiveness of speech enhancement. This research concludes that focusing on adaptive and specialized train- ing frameworks can greatly improve ASR performance in real-world noise environments, providing valuable insights for advancing speech recognition technology in practical applications.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Nayak, S. |
Date Deposited: | 20 Aug 2024 11:47 |
Last Modified: | 20 Aug 2024 11:47 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/503 |
Actions (login required)
View Item |