Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.AI English(EN) · 2w · [3 sources]

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

Researchers have developed new unified models for generating human vocal audio, capable of producing both speech and singing. UniVoice uses a conditional flow matching approach, separating content, melody, and timbre to allow for distinct control over speech prosody and singing melody. UniSinger, built on a multimodal diffusion transformer, unifies speaker cloning song generation with accompaniment co-generation for singing voice conversion. Both models demonstrate state-of-the-art performance on their respective tasks, offering new possibilities for audio generation and music production. AI

IMPACT These models advance the state-of-the-art in unified audio generation, potentially impacting music production and accessibility tools.
TOOL · Hugging Face Daily Papers English(EN) · 1mo

VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models

Researchers have developed VocalParse, a new model for transcribing singing voices that utilizes a Large Audio Language Model (LALM). This model addresses limitations in current systems by jointly modeling lyrics, melody, and text-note alignments through an interleaved prompting formulation. VocalParse also employs a Chain-of-Thought strategy to first decode lyrics, which helps maintain structural integrity and improve transcription accuracy, achieving state-of-the-art results on various singing datasets. AI

IMPACT Advances singing voice transcription accuracy and scalability, potentially improving tools for music production and analysis.

Brief

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models