Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

An Empirical Study on Learning Latent Representations for Emotional Speech Synthesis

Researchers have developed a new system for emotional speech synthesis (ESS) that integrates speaker embeddings and prosody bottlenecks into the FastSpeech 2 model. This system is designed to generate humanlike, natural-sounding voices with desired emotional expressions. It can produce emotional speech for a single speaker or transfer speaking styles between speakers while preserving the target speaker's identity. AI

Hugging Face
arXiv
FastSpeech 2
VLSP 2022