PulseAugur
EN
LIVE 14:06:24

New ParsVoice Corpus Boosts Persian TTS Capabilities

Researchers have introduced ParsVoice, a substantial new corpus of Persian speech and text data designed to advance text-to-speech (TTS) synthesis and other speech processing tasks for the Persian language. This dataset, comprising 2,200 hours of TTS-ready audio with over 1.36 million aligned segments from 1,815 identified speakers, is significantly larger than previous Persian speech datasets. The creation process involved a sophisticated pipeline that includes fine-tuning a ParsBERT model, optimizing audio boundaries, restoring punctuation, and performing speaker identification and quality assessments. The effectiveness of ParsVoice was demonstrated by fine-tuning a multilingual TTS model, XTTS, which achieved notable naturalness and speaker similarity scores. AI

IMPACT This large-scale corpus aims to significantly improve the quality and availability of Persian text-to-speech systems, potentially enabling new applications and research in low-resource languages.

RANK_REASON The cluster describes a new academic paper detailing the creation of a large-scale speech corpus for a specific language. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New ParsVoice Corpus Boosts Persian TTS Capabilities

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery ·

    ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

    arXiv:2510.10774v3 Announce Type: replace-cross Abstract: Persian remains substantially underrepresented in open speech-text resources, limiting progress in multi-speaker text-to-speech (TTS), speech-language modelling, and low-resource speech processing. We introduce ParsVoice, …