Researchers have developed a new approach to speaker diarization, the process of identifying who spoke when in an audio recording, specifically for low-resource languages like Nepali-Hindi. They trained two neural network architectures, EEND-EDA and DiaPer, on a multilingual dataset that included English, diverse speaker recordings, and newly collected Nepali and Hindi audio. The DiaPer model, utilizing Perceiver-based attractors, demonstrated superior performance, achieving significantly lower diarization error rates (DERs) on Nepali-Hindi test sets compared to the EEND-EDA model, particularly in challenging multi-speaker scenarios. AI
IMPACT This research advances speaker diarization capabilities for underrepresented languages, potentially improving accessibility and information retrieval tools for diverse linguistic communities.
RANK_REASON Academic paper detailing a new model architecture and evaluation on specific datasets. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →