Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 12h

CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering

Researchers have introduced CoCoEmo, a novel framework for generating human-like emotional speech through text-to-speech (TTS) systems. This system allows for controllable and composable emotional expression, moving beyond single-utterance emotions to enable mixed or text-emotion-misaligned speech. The study demonstrates that emotional prosody is primarily synthesized by the TTS language module, offering a lightweight approach for natural emotional speech synthesis. AI

IMPACT Enables more nuanced and human-like emotional expression in TTS systems, potentially improving user experience in voice assistants and other applications.

Hugging Face
arXiv
CoCoEmo
Siyi Wang