New Japanese TTS system tackles kanji polyphony with massive data scaling

By PulseAugur Editorial · [2 sources] · 2026-06-24 03:57

Researchers have developed Sarashina2.2-TTS, a novel text-to-speech system specifically designed for Japanese, addressing the challenge of kanji polyphony. The system utilizes a massive dataset of approximately 361,000 hours of speech, including a balanced mix of Japanese and English, and employs targeted data augmentation to improve kanji reading accuracy. Sarashina2.2-TTS introduces the Joyo Kanji Yomi Benchmark and a new metric, Kana-CER, to evaluate pronunciation correctness. Experiments show that the system achieves state-of-the-art kanji-level reading accuracy and high speaker similarity in zero-shot synthesis, while also demonstrating improved cross-lingual robustness. AI

IMPACT This research advances Japanese speech synthesis capabilities, potentially improving accessibility and applications for Japanese language users.

RANK_REASON The cluster describes a new research paper detailing a novel TTS system and benchmark.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New Japanese TTS system tackles kanji polyphony with massive data scaling

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Lianbo Liu, Shiao Zhu, Kai Washizaki, Reo Yoneyama, Haesung Jeon, Mengjie Zhao, Yusuke Fujita, Hao Shi, Nao Yoshida, Yuan Gao, Roman Koshkin, Yukiya Hono, Yui Sudo · 2026-06-25 04:00

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

arXiv:2606.25369v1 Announce Type: cross Abstract: While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique linguis…
arXiv cs.CL TIER_1 English(EN) · Yui Sudo · 2026-06-24 03:57

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique linguistic challenges, such as widespread context-depende…

COVERAGE [2]

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

RELATED ENTITIES

RELATED TOPICS