PulseAugur
实时 05:54:15

New metrics assess text-to-speech voice quality and naturalness

Researchers have developed a metric-based approach to assess the quality of text-to-speech (TTS) systems by analyzing voice mapping. The study evaluated six influential TTS models, including VITS, Glow-TTS, and Tacotron 2, using metrics like crest factor, spectrum balance, and cepstral peak prominence (CPPs). Findings indicate that voice range is a key indicator of model capability, with VITS showing the broadest range, while Glow-TTS excels in soft phonation. The research also established that CPPs values between 7-8 dB correlate with natural voice quality, whereas values above 10 dB can result in a robotic sound. AI

影响 Introduces new metrics for evaluating TTS naturalness and expressiveness, potentially guiding future model development.

排序理由 Academic paper proposing a new evaluation framework for TTS systems. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New metrics assess text-to-speech voice quality and naturalness

报道来源 [1]

  1. arXiv cs.AI TIER_1 English(EN) · Huanchen Cai, Sten Ternstr\"om ·

    Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

    arXiv:2605.00861v1 Announce Type: cross Abstract: This study investigates voice mapping as an evaluation framework for text-to-speech (TTS) synthesis quality. The study analyzes six TTS models, including historical and recent ones. The metrics are crest factor, spectrum balance, …