New models unify speech and singing voice generation

By PulseAugur Editorial · [3 sources] · 2026-06-05 07:59

Researchers have developed new unified models for generating human vocal audio, capable of producing both speech and singing. UniVoice uses a conditional flow matching approach, separating content, melody, and timbre to allow for distinct control over speech prosody and singing melody. UniSinger, built on a multimodal diffusion transformer, unifies speaker cloning song generation with accompaniment co-generation for singing voice conversion. Both models demonstrate state-of-the-art performance on their respective tasks, offering new possibilities for audio generation and music production. AI

IMPACT These models advance the state-of-the-art in unified audio generation, potentially impacting music production and accessibility tools.

RANK_REASON Two research papers introducing new models for audio generation.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New models unify speech and singing voice generation

COVERAGE [3]

arXiv cs.AI TIER_1 English(EN) · Ziyu Zhang, Chunyu Qiang, Xiaopeng Wang, Yuxin Guo, Kang Yin, Wenjie Tian, Jingbin Hu, Tianlun Zuo, Zhao Guo, Teng Ma, Yuzhe Liang, Chen Zhang, Lei Xie · 2026-06-08 04:00

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

arXiv:2606.07015v1 Announce Type: cross Abstract: While song generation and singing voice conversion (SVC) have evolved significantly, they have long been developed isolated: the former lacks zero-shot speaker cloning, while the latter overlooks vocal-accompaniment synergy. To br…
arXiv cs.AI TIER_1 English(EN) · Junjie Zheng, Huixin Xue, Shihong Ren, Chaofan Ding, Hao Liu, Zihao Chen · 2026-06-06 04:00

UniVoice: A Unified Model for Speech and Singing Voice Generation

arXiv:2606.05852v1 Announce Type: cross Abstract: Text-to-speech (TTS) and singing voice synthesis (SVS) both aim to generate human vocal audio from symbolic inputs, but they impose different requirements on the generation process. Speech generation relies on flexible, language-d…
arXiv cs.AI TIER_1 English(EN) · Lei Xie · 2026-06-05 07:59

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

While song generation and singing voice conversion (SVC) have evolved significantly, they have long been developed isolated: the former lacks zero-shot speaker cloning, while the latter overlooks vocal-accompaniment synergy. To bridge this gap, we propose UniSinger, the first end…

COVERAGE [3]

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

UniVoice: A Unified Model for Speech and Singing Voice Generation

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

RELATED ENTITIES

RELATED TOPICS