PulseAugur
LIVE 08:02:19
research · [1 source] ·
0
research

New TTS framework synthesizes Tibetan dialects from limited data

Researchers have developed FMSD-TTS, a novel few-shot text-to-speech system designed to generate speech for the low-resource Tibetan language across its three main dialects: Ü-Tsang, Amdo, and Kham. The system utilizes a speaker-dialect fusion module and a Dialect-Specialized Dynamic Routing Network to accurately capture variations while maintaining speaker identity. Evaluations show FMSD-TTS outperforms existing methods in dialectal expressiveness and speaker similarity, with the synthesized speech validated on a speech-to-speech dialect conversion task. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Enables creation of synthetic speech for low-resource languages, potentially aiding in dialect preservation and accessibility.

RANK_REASON This is a research paper describing a new text-to-speech system for a low-resource language.

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Yutong Liu, Ziyue Zhang, Ban Ma-bao, Yuqing Cai, Yongbin Yu, Renzeng Duojie, Xiangxiang Wang, Fan Gao, Cheng Huang, Nyima Tashi ·

    FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for \"U-Tsang, Amdo and Kham Speech Dataset Generation

    arXiv:2505.14351v4 Announce Type: replace-cross Abstract: Tibetan is a low-resource language with minimal parallel speech corpora spanning its three major dialects-\"U-Tsang, Amdo, and Kham-limiting progress in speech modeling. To address this issue, we propose FMSD-TTS, a few-sh…