Researchers have developed Sarashina2.2-TTS, a novel text-to-speech system specifically designed for Japanese, addressing the challenge of kanji polyphony. The system utilizes a massive dataset of approximately 361,000 hours of speech, including a balanced mix of Japanese and English, and employs targeted data augmentation to improve kanji reading accuracy. Sarashina2.2-TTS introduces the Joyo Kanji Yomi Benchmark and a new metric, Kana-CER, to evaluate pronunciation correctness. Experiments show that the system achieves state-of-the-art kanji-level reading accuracy and high speaker similarity in zero-shot synthesis, while also demonstrating improved cross-lingual robustness. AI
IMPACT This research advances Japanese speech synthesis capabilities, potentially improving accessibility and applications for Japanese language users.
RANK_REASON The cluster describes a new research paper detailing a novel TTS system and benchmark.
- Agency for Cultural Affairs
- English
- Japan
- Japanese
- Joyo Kanji Yomi Benchmark
- Kana-CER
- kanji
- Sarashina2.2-TTS
- Text To Speech
- Standard Chinese
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →