Researchers have introduced ParsVoice, a substantial new corpus of Persian speech and text data designed to advance text-to-speech (TTS) synthesis and other speech processing tasks for the Persian language. This dataset, comprising 2,200 hours of TTS-ready audio with over 1.36 million aligned segments from 1,815 identified speakers, is significantly larger than previous Persian speech datasets. The creation process involved a sophisticated pipeline that includes fine-tuning a ParsBERT model, optimizing audio boundaries, restoring punctuation, and performing speaker identification and quality assessments. The effectiveness of ParsVoice was demonstrated by fine-tuning a multilingual TTS model, XTTS, which achieved notable naturalness and speaker similarity scores. AI
IMPACT This large-scale corpus aims to significantly improve the quality and availability of Persian text-to-speech systems, potentially enabling new applications and research in low-resource languages.
RANK_REASON The cluster describes a new academic paper detailing the creation of a large-scale speech corpus for a specific language. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →