Researchers have developed a new dataset and framework for improving Romanian-accented speech recognition, specifically for parliamentary proceedings. The ROManian PARliamentary Speech Corpus (ROMPAR) includes 17.80 hours of Romanian and Moldavian parliamentary speech, with double annotations and labels for reconstructed word fragments. A multi-task adversarial training framework was implemented to ensure demographic invariance across age, gender, and dialect, along with an LLM-guided decoding strategy for morphological completion of truncated words. This approach significantly reduced word error rate and achieved a 96.6% F1-score in morphological reconstruction. AI
排序理由 The cluster contains an academic paper detailing a new dataset and framework for a specific NLP task. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →