Researchers have developed a new dataset and framework for improving Romanian-accented speech recognition, specifically for parliamentary proceedings. The ROManian PARliamentary Speech Corpus (ROMPAR) includes 17.80 hours of Romanian and Moldavian parliamentary speech, with double annotations and labels for reconstructed word fragments. A multi-task adversarial training framework was implemented to ensure demographic invariance across age, gender, and dialect, along with an LLM-guided decoding strategy for morphological completion of truncated words. This approach significantly reduced word error rate and achieved a 96.6% F1-score in morphological reconstruction. AI
RANK_REASON The cluster contains an academic paper detailing a new dataset and framework for a specific NLP task. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →