Researchers have developed a metric-based approach to assess the quality of text-to-speech (TTS) systems by analyzing voice mapping. The study evaluated six influential TTS models, including VITS, Glow-TTS, and Tacotron 2, using metrics like crest factor, spectrum balance, and cepstral peak prominence (CPPs). Findings indicate that voice range is a key indicator of model capability, with VITS showing the broadest range, while Glow-TTS excels in soft phonation. The research also established that CPPs values between 7-8 dB correlate with natural voice quality, whereas values above 10 dB can result in a robotic sound. AI
影响 Introduces new metrics for evaluating TTS naturalness and expressiveness, potentially guiding future model development.
排序理由 Academic paper proposing a new evaluation framework for TTS systems. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →