Personalized Speech Enhancement Using Time-Domain Convolutional Networks

Zhang, Ziyun (2025) Personalized Speech Enhancement Using Time-Domain Convolutional Networks. Master thesis, Voice Technology (VT).

Preview

PDF
MAS5657636ZZhang.pdf
Download (340kB) | Preview

Abstract

This paper proposes a personalized speech enhancement method based on time-domain convolu tional networks, which achieves precise extraction of target speaker’s speech by directly integrating speaker embeddings (d-vector) into the time-domain processing pipeline of Conv-TasNet. Unlike ex isting frequency-domain methods, this research avoids information loss caused by frequency-domain conversion and designs a multi-objective loss function to simultaneously optimize signal fidelity and speaker consistency. Experimental results show that the proposed method outperforms existing base line methods on objective evaluation metrics, especially demonstrating stronger robustness in low SNRand complex mixing conditions. This research provides new technical approaches for the field of personalized speech enhancement, with potential applications in smart devices, remote commu nication, and assistive technologies.

Item Type:	Thesis (Master)
Name supervisor:	Nayak, S.
Date Deposited:	07 Jul 2025 08:31
Last Modified:	07 Jul 2025 08:31
URI:	https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/688

Actions (login required)

View Item