New Romanian speech corpus tackles demographic bias in parliamentary ASR

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

Researchers have developed a new dataset and framework for improving Romanian-accented speech recognition, specifically for parliamentary proceedings. The ROManian PARliamentary Speech Corpus (ROMPAR) includes 17.80 hours of Romanian and Moldavian parliamentary speech, with double annotations and labels for reconstructed word fragments. A multi-task adversarial training framework was implemented to ensure demographic invariance across age, gender, and dialect, along with an LLM-guided decoding strategy for morphological completion of truncated words. This approach significantly reduced word error rate and achieved a 96.6% F1-score in morphological reconstruction. AI

RANK_REASON The cluster contains an academic paper detailing a new dataset and framework for a specific NLP task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Andrei-Marius Avram, Aureliu-Valentin Antonie, \c{S}tefan-Bogdan Badea, Andrei Florea, Robert-Nicolae Zaharoiu, Dumitru-Clementin Cercel · 2026-06-16 04:00

ROMPAR: Morphological Completion and Demographic Unlearning for Romanian-Accented Speech Recognition

arXiv:2606.15984v1 Announce Type: new Abstract: Automated transcription of parliamentary proceedings faces significant hurdles due to demographic bias, dialectal variation, and technical artifacts such as utterance truncation during segmentation. This paper introduces the ROMania…

COVERAGE [1]

ROMPAR: Morphological Completion and Demographic Unlearning for Romanian-Accented Speech Recognition

RELATED ENTITIES

RELATED TOPICS