PulseAugur
EN
LIVE 14:06:25

New TTS system tackles Japanese kanji polyphony with massive data

Researchers have developed Sarashina2.2-TTS, a novel Japanese text-to-speech (TTS) system designed to overcome the challenges of context-dependent kanji polyphony. The system utilizes a massive dataset of approximately 361,000 hours of speech, including a balanced mix of Japanese and English, and employs a targeted data augmentation pipeline to address the 2,136 Joyo kanji. To evaluate its performance, a new benchmark, the Joyo Kanji Yomi Benchmark, and a metric called Kana-CER were introduced, focusing on pronunciation correctness. Sarashina2.2-TTS demonstrates state-of-the-art accuracy in kanji reading and speaker similarity for zero-shot Japanese speech synthesis, also showing improved cross-lingual robustness. AI

IMPACT This development advances Japanese language capabilities in TTS, potentially improving accessibility and applications for Japanese speakers.

RANK_REASON The cluster describes a new research paper detailing a novel TTS system and benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New TTS system tackles Japanese kanji polyphony with massive data

COVERAGE [1]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

    While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique linguistic challenges, such as widespread context-depende…