BareWave: Waveform-Native Flow-Matching Text-to-Speech
Researchers have developed BareWave, a novel text-to-speech system that generates audio directly from text without intermediate representations. This waveform-native approach addresses challenges in raw waveform modeling by aligning representations, using staged noise schedules, and incorporating velocity-aware perceptual alignment. The system demonstrates strong performance in zero-shot voice cloning, achieving high intelligibility, speaker similarity, and naturalness. AI
IMPACT Introduces a waveform-native approach to TTS, potentially simplifying model architectures and improving voice cloning capabilities.