Balalaika pipeline enhances Russian speech data with prosody-aware annotations

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-05 04:00

Researchers have developed Balalaika, an open-source pipeline designed for annotating Russian speech data with a focus on prosody. This system integrates semantic voice activity detection, multi-ASR ensembling, and automatic quality filtering to create a 5.1k-hour corpus. The pipeline also enriches the text with punctuation, lexical stress, and phoneme normalization, demonstrating consistent improvements in speech denoising and text-to-speech synthesis. AI

影响 Introduces a new pipeline for processing and annotating Russian speech data, potentially improving downstream speech synthesis and denoising models.

排序理由 This is a research paper describing a new data annotation pipeline for speech. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Kirill Borodin, Nikita Vasiliev, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Grach Mkrtchian · 2026-05-05 04:00

Balalaika: Data-Centric, Prosody-Aware Annotation Pipeline for Russian Speech

arXiv:2507.13563v2 Announce Type: replace Abstract: We introduce Balalaika, an open-source, data-centric pipeline for processing audio and producing prosody-aware annotations. It combines semantic VAD for context-preserving segmentation, multi-ASR ensembling with ROVER consensus …

报道来源 [1]

Balalaika: Data-Centric, Prosody-Aware Annotation Pipeline for Russian Speech

相关实体

相关话题