English(EN) How Well Do Self-Supervised Speech Models Encode Age and Gender in Children's Speech? A Layer-Wise Analysis Across Multiple Architectures

研究发现：语音模型在早期层级编码儿童的年龄/性别

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-20 18:16

研究人员分析了自监督学习（SSL）模型在儿童语音中捕获年龄和性别信息的有效性。该研究聚焦于四种模型：Wav2Vec2、HuBERT、Data2Vec 和 WavLM，并使用 PFSTAR 和 CMU Kids 数据集检查了它们的层级。结果表明，早期到中期层级在编码这些副语言线索方面最有效。HuBERT 在年龄分类方面表现最佳，而 Wav2Vec2 和 HuBERT 在性别分类方面领先。 AI

影响这项研究阐明了 AI 模型如何表示儿童语音中的年龄和性别，从而可能改进语音助手或教育工具等下游应用。

排序理由分析自监督语音模型的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Sudarsana Reddy Kadiri · 2026-06-20 18:16

自监督语音模型在儿童语音中对年龄和性别的编码效果如何？多架构的层级分析

Self-supervised learning (SSL) models have become a central component of modern speech processing systems, as they enable the learning of rich acoustic representations without reliance on labeled data. Despite their success on adult speech, it remains unclear how effectively thes…

报道来源 [1]

自监督语音模型在儿童语音中对年龄和性别的编码效果如何？多架构的层级分析

相关实体

相关话题