Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation
Researchers have developed new unified models for generating human vocal audio, capable of producing both speech and singing. UniVoice uses a conditional flow matching approach, separating content, melody, and timbre to allow for distinct control over speech prosody and singing melody. UniSinger, built on a multimodal diffusion transformer, unifies speaker cloning song generation with accompaniment co-generation for singing voice conversion. Both models demonstrate state-of-the-art performance on their respective tasks, offering new possibilities for audio generation and music production. AI
IMPACT These models advance the state-of-the-art in unified audio generation, potentially impacting music production and accessibility tools.