PulseAugur / Brief
EN
LIVE 14:59:12

Brief

last 24h
[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

    Researchers have developed ZeSTA, a new framework for improving personalized speech synthesis using zero-shot text-to-speech (ZS-TTS) as a data augmentation source. The method addresses the common issue of speaker similarity degradation when mixing synthetic and real speech data during fine-tuning. ZeSTA employs a domain-conditioned training approach that distinguishes between real and synthetic speech, coupled with oversampling of real data to stabilize adaptation, particularly in low-resource scenarios. AI

    ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

    IMPACT This research could lead to more efficient and effective personalized voice generation, particularly in scenarios with limited training data.

  2. Continual Speaker Identity Unlearning with Minimal Interference

    Researchers have developed a new framework called Cumulative ORThogonal Identity Suppression (CORTIS) to address the challenge of continually unlearning speaker identities from zero-shot text-to-speech (ZS-TTS) models. Existing methods fail when unlearning requests are sequential, as they can revive previously unlearned speakers. CORTIS, however, uses Fisher-information-based parameter masking and orthogonal projection to ensure that once a speaker identity is unlearned, it remains forgotten even with subsequent unlearning requests, without needing access to the previously unlearned data. This approach was demonstrated to be effective with the VoiceBox model, outperforming sequential applications of prior methods. AI

    Continual Speaker Identity Unlearning with Minimal Interference

    IMPACT This research addresses a critical privacy concern in generative audio models, enabling more robust and sequential unlearning of sensitive data.