Researchers have developed Sarashina2.2-TTS, a novel Japanese text-to-speech (TTS) system designed to overcome the challenges of context-dependent kanji polyphony. The system utilizes a massive dataset of approximately 361,000 hours of speech, including a balanced mix of Japanese and English, and employs a targeted data augmentation pipeline to address the 2,136 Joyo kanji. To evaluate its performance, a new benchmark, the Joyo Kanji Yomi Benchmark, and a metric called Kana-CER were introduced, focusing on pronunciation correctness. Sarashina2.2-TTS demonstrates state-of-the-art accuracy in kanji reading and speaker similarity for zero-shot Japanese speech synthesis, also showing improved cross-lingual robustness. AI
IMPACT This development advances Japanese language capabilities in TTS, potentially improving accessibility and applications for Japanese speakers.
RANK_REASON The cluster describes a new research paper detailing a novel TTS system and benchmark. [lever_c_demoted from research: ic=1 ai=1.0]
Read on Hugging Face Daily Papers →
- Agency for Cultural Affairs
- English
- Japan
- Japanese
- Joyo Kanji Yomi Benchmark
- Kana-CER
- Sarashina2.2-TTS
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →