Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 10h

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

Researchers have developed ZeSTA, a new framework for improving personalized speech synthesis using zero-shot text-to-speech (ZS-TTS) as a data augmentation source. The method addresses the common issue of speaker similarity degradation when mixing synthetic and real speech data during fine-tuning. ZeSTA employs a domain-conditioned training approach that distinguishes between real and synthetic speech, coupled with oversampling of real data to stabilize adaptation, particularly in low-resource scenarios. AI

IMPACT This research could lead to more efficient and effective personalized voice generation, particularly in scenarios with limited training data.
- LibriTTS
- ZS-TTS
- ZeSTA
- Youngwon Choi
TOOL · arXiv cs.CL English(EN) · 10h

Analyzing Error Propagation in Korean Spoken QA with ASR-LLM Cascades

A new research paper analyzes how errors in Korean speech recognition impact the performance of large language models (LLMs) in spoken question answering (SQA). The study found that the degradation caused by speech recognition errors is consistent across different LLMs, suggesting that the information loss at the speech recognition stage is the primary driver of performance decline. The research also identified single-character errors in Korean transcriptions as a unique vulnerability that can alter the intended question and degrade QA accuracy. An auxiliary comparison indicated that large audio language models may offer a more robust solution by directly processing audio input, potentially mitigating issues caused by transcription errors. AI

IMPACT Highlights potential for direct audio input models to improve spoken language understanding in noisy conditions.

Brief

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

Analyzing Error Propagation in Korean Spoken QA with ASR-LLM Cascades