Yue, Jingxuan (2024) Identifying Acoustic Features that Enhance TTS Voice Intelligibility and Naturalness in Noisy Environments. Master thesis, Voice Technology (VT).
|
PDF
MSc-S5657660-J-Yue.pdf Download (774kB) | Preview |
Abstract
With the continuous advancement of voice technology, the application of TTS (Text-to-Speech) in daily life has become increasingly widespread. However, the acoustic environments in practical application scenarios are complex and variable, filled with different levels of noise, which poses challenges to the intelligibility and naturalness of TTS voice. Research indicates that speech with Lombard speech characteristics has higher intelligibility in noisy environments. Therefore, to identify and understand which acoustic features can effectively enhance the intelligibility and naturalness of synthetic speech in noisy conditions, this study conducted systematic and comprehensive experiments. The results show that enhancing F0 independently can significantly improve the intelligibility of synthetic speech in noisy environments; while increasing duration independently can enhance intelligibility, it also decreases naturalness. On the other hand, typical Lombard speech and solely flattening spectral tilt have no effect on improving naturalness, providing valuable insights for the development of more adaptive and user-centered TTS systems.
Item Type: | Thesis (Master) |
---|---|
Name supervisor: | Coler, M.L. |
Date Deposited: | 16 Jul 2024 13:58 |
Last Modified: | 16 Jul 2024 13:58 |
URI: | https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/518 |
Actions (login required)
View Item |