New TTS system tackles Japanese kanji polyphony with massive data

By PulseAugur Editorial · [1 sources] · 2026-06-24 03:57

Researchers have developed Sarashina2.2-TTS, a novel Japanese text-to-speech (TTS) system designed to overcome the challenges of context-dependent kanji polyphony. The system utilizes a massive dataset of approximately 361,000 hours of speech, including a balanced mix of Japanese and English, and employs a targeted data augmentation pipeline to address the 2,136 Joyo kanji. To evaluate its performance, a new benchmark, the Joyo Kanji Yomi Benchmark, and a metric called Kana-CER were introduced, focusing on pronunciation correctness. Sarashina2.2-TTS demonstrates state-of-the-art accuracy in kanji reading and speaker similarity for zero-shot Japanese speech synthesis, also showing improved cross-lingual robustness. AI

IMPACT This development advances Japanese language capabilities in TTS, potentially improving accessibility and applications for Japanese speakers.

RANK_REASON The cluster describes a new research paper detailing a novel TTS system and benchmark. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New TTS system tackles Japanese kanji polyphony with massive data

COVERAGE [1]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-24 03:57

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique linguistic challenges, such as widespread context-depende…

COVERAGE [1]

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

RELATED ENTITIES

RELATED TOPICS