Researchers have introduced VieSpeaker, a new large-scale dataset for Vietnamese speaker recognition that does not rely on visual cues. This dataset was constructed using a novel pipeline that leverages textual metadata and large language model reasoning to infer speaker identities, overcoming limitations of existing corpora that require speakers to be on camera. VieSpeaker comprises approximately 902 hours of speech from 4,715 speakers and has demonstrated improved robustness and generalization for trained models compared to existing Vietnamese datasets. AI
IMPACT Provides a new resource for advancing speaker recognition technology, particularly for under-resourced languages like Vietnamese.
RANK_REASON The cluster describes a new academic dataset and research paper.
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →