Javascript must be enabled for the correct page display

Assessing the relationship between stimulus duration and Mean Opinion Score for speech synthesis evaluation

Hongell, Brandi (2024) Assessing the relationship between stimulus duration and Mean Opinion Score for speech synthesis evaluation. Master thesis, Voice Technology (VT).

[img]
Preview
PDF
MSc-s5541727-B-Hongell.pdf

Download (627kB) | Preview

Abstract

Despite the rapid advancements in speech synthesis, the Mean Opinion Score (MOS), established in the 1990s and relatively unchanged since, remains the standard for evaluating speech synthesis. Lack of reassessment of MOS over time has raised many questions about the reliabilty and robustness of the field’s predominant evaluation metric. Therefore, this study critically assesses how non-standardized testing variables may affect MOS, using listening tests to measure how four different durations of synthetic speech clips interact with the MOS ratings of three different synthetic voices. While the results show that duration does not have a statistically significant impact on the MOS of a synthetic voice, therefore producing inconclusive results, there is promise in continued research as shown by the 33.9% reported effect size. This moderately strong effect size suggests the possibility of a meaningful association between duration and MOS. Overall, this study highlights the lack of standardization present in MOS evaluation and the questions the reliability of this evaluation metric. It is therefore suggested to continue MOS research on not only duration, but also other unstandardized variables, as well as the implementation of best practices in MOS testing.

Item Type: Thesis (Master)
Name supervisor: Coler, M.L.
Date Deposited: 11 Jul 2024 08:39
Last Modified: 11 Jul 2024 08:39
URI: https://campus-fryslan.studenttheses.ub.rug.nl/id/eprint/509

Actions (login required)

View Item View Item