PulseAugur
EN
LIVE 06:17:54

New HPRO framework enhances emotional expressiveness in TTS models

Researchers have introduced HPRO, a hierarchical progressive reward optimization framework designed to improve emotional expressiveness in large language model-based text-to-speech (TTS) systems. This new framework addresses issues like information conflict and scale gap found in existing preference-driven optimization methods. HPRO utilizes an HD-Emo codec to separate content and emotional preference tokens, thereby isolating emotional optimization from semantic content and mitigating reward hacking. The system progressively aligns objectives at frame, word, and sentence levels, leading to enhanced emotional expression while maintaining linguistic intelligibility. AI

IMPACT This research could lead to more emotionally nuanced and natural-sounding AI-generated speech, improving user experience in applications like virtual assistants and audio content creation.

RANK_REASON The cluster contains an academic paper detailing a new method for improving text-to-speech models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New HPRO framework enhances emotional expressiveness in TTS models

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Xiangmin Xu ·

    HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

    Recently, Large Language Model (LLM)-based Text-to-Speech (TTS) models have achieved remarkable naturalness. However, the standard Supervised Fine-Tuning paradigm often converges to statistically averaged prosody, limiting emotional expressiveness. While preference-driven optimiz…