Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · Hugging Face Daily Papers English(EN) · 4d

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

Researchers have developed SwanVoice, a novel zero-shot text-to-speech system capable of generating expressive, long-form dialogue for multiple speakers. The system combines VAE, flow-matching DiT, and diffusion post-training techniques, building upon a new dataset called SwanData-Speech. SwanVoice aims to overcome limitations in acoustic consistency and affective continuity across dialogue turns, outperforming existing open-source baselines in richness and hierarchy on the SwanBench-Speech benchmark, though content accuracy is noted as a remaining challenge. AI

IMPACT Introduces a new method for more natural and coherent multi-speaker dialogue synthesis, potentially improving conversational AI agents.
RESEARCH · Hugging Face Daily Papers English(EN) · 6d · [4 sources]

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Researchers have developed new methods to improve the efficiency and performance of speech processing models. FastSLM introduces a hierarchical temporal abstractor to compress audio data significantly while retaining crucial acoustic details, outperforming state-of-the-art models with fewer resources. SALSA offers a lightweight adaptation technique for speech-aware large language models, enhancing their generalization to diverse and out-of-domain speech by learning specific steering vectors. Additionally, a novel training optimization method allows for the joint adjustment of performance and computational complexity in speech models, enabling dynamic size optimization without post-hoc pruning. AI

IMPACT These advancements aim to improve the efficiency and adaptability of speech models, potentially enabling more robust and versatile AI applications in audio processing and language understanding.

Brief

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios