Speech models encode child age/gender in early layers, study finds

By PulseAugur Editorial · [1 sources] · 2026-06-20 18:16

Researchers have analyzed how well self-supervised learning (SSL) models capture age and gender information in children's speech. The study focused on four models: Wav2Vec2, HuBERT, Data2Vec, and WavLM, examining their layers using the PFSTAR and CMU Kids datasets. Results indicate that early to mid-level layers are most effective at encoding these paralinguistic cues. HuBERT performed best for age classification, while Wav2Vec2 and HuBERT led in gender classification. AI

IMPACT This research clarifies how age and gender are represented in children's speech by AI models, potentially improving downstream applications like voice assistants or educational tools.

RANK_REASON Academic paper analyzing self-supervised speech models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Speech models encode child age/gender in early layers, study finds

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Sudarsana Reddy Kadiri · 2026-06-20 18:16

How Well Do Self-Supervised Speech Models Encode Age and Gender in Children's Speech? A Layer-Wise Analysis Across Multiple Architectures

Self-supervised learning (SSL) models have become a central component of modern speech processing systems, as they enable the learning of rich acoustic representations without reliance on labeled data. Despite their success on adult speech, it remains unclear how effectively thes…

COVERAGE [1]

How Well Do Self-Supervised Speech Models Encode Age and Gender in Children's Speech? A Layer-Wise Analysis Across Multiple Architectures

RELATED ENTITIES

RELATED TOPICS