PulseAugur
EN
LIVE 07:12:36

New VieSpeaker Dataset Enhances Vietnamese Speaker Recognition Without Visual Cues · 3 sources tracked

Researchers have introduced VieSpeaker, a new large-scale dataset for Vietnamese speaker recognition that does not rely on visual cues. This dataset was constructed using a novel pipeline that leverages textual metadata and large language model reasoning to infer speaker identities, overcoming limitations of existing corpora that require speakers to be on camera. VieSpeaker comprises approximately 902 hours of speech from 4,715 speakers and has demonstrated improved robustness and generalization for trained models compared to existing Vietnamese datasets. AI

IMPACT Provides a new resource for advancing speaker recognition technology, particularly for under-resourced languages like Vietnamese.

RANK_REASON The cluster describes a new academic dataset and research paper.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New VieSpeaker Dataset Enhances Vietnamese Speaker Recognition Without Visual Cues · 3 sources tracked

COVERAGE [3]

  1. arXiv cs.CL TIER_1 English(EN) · Viet Hoang Pham, Tran Trung Nguyen, Bao Thu Ho, Phuong Tuan Dat, Thi Thu Trang Nguyen ·

    VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

    arXiv:2606.24066v1 Announce Type: cross Abstract: Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to l…

  2. arXiv cs.CL TIER_1 English(EN) · Thi Thu Trang Nguyen ·

    VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

    Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to link speech with speaker identities, restricting da…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

    Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to link speech with speaker identities, restricting da…