Researchers have introduced FalAR, a new large-scale speech corpus for European Portuguese parliamentary sessions, aiming to improve Automatic Speech Recognition (ASR) for the language. The corpus contains approximately 5,800 hours of speech data spanning 20 years, with speaker identity annotations for 1,180 individuals. Experiments show that using FalAR for pre-training can lead to a significant improvement in ASR performance, reducing Word Error Rate (WER) by up to 14%. AI
IMPACT This corpus aims to significantly improve ASR performance for European Portuguese, addressing a gap in resources compared to Brazilian Portuguese.
RANK_REASON The cluster contains a research paper detailing a new dataset for ASR.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →