CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering
Researchers have introduced CoCoEmo, a novel framework for generating human-like emotional speech through text-to-speech (TTS) systems. This system allows for controllable and composable emotional expression, moving beyond single-utterance emotions to enable mixed or text-emotion-misaligned speech. The study demonstrates that emotional prosody is primarily synthesized by the TTS language module, offering a lightweight approach for natural emotional speech synthesis. AI
IMPACT Enables more nuanced and human-like emotional expression in TTS systems, potentially improving user experience in voice assistants and other applications.