PulseAugur
实时 12:05:36
English(EN) VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

新的VieSpeaker数据集在无视觉线索的情况下增强了越南语说话人识别能力 · 跟踪3个来源

研究人员推出VieSpeaker,一个用于越南语说话人识别的新大规模数据集,该数据集不依赖视觉线索。该数据集是利用一种新颖的流程构建的,该流程利用文本元数据和大型语言模型推理来推断说话人身份,克服了现有语料库中需要说话人在镜头前的局限性。VieSpeaker包含约902小时的语音,来自4,715名说话人,与现有的越南语数据集相比,在训练模型的鲁棒性和泛化能力方面表现出改进。 AI

影响 为推进说话人识别技术提供了新资源,特别是对于越南语等资源匮乏的语言。

排序理由 该集群描述了一个新的学术数据集和研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新的VieSpeaker数据集在无视觉线索的情况下增强了越南语说话人识别能力 · 跟踪3个来源

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Viet Hoang Pham, Tran Trung Nguyen, Bao Thu Ho, Phuong Tuan Dat, Thi Thu Trang Nguyen ·

    VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

    arXiv:2606.24066v1 Announce Type: cross Abstract: Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to l…

  2. arXiv cs.CL TIER_1 English(EN) · Thi Thu Trang Nguyen ·

    VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

    Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to link speech with speaker identities, restricting da…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

    Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to link speech with speaker identities, restricting da…