PulseAugur
EN
LIVE 10:36:31

Speech models encode speaker demographics, impacting fairness

A new research paper explores how self-supervised speech recognition models encode information about speaker groups. The study found that these models can identify characteristics such as gender, age, dialect, ethnicity, and native speaker status. Fine-tuning the models for speaker identification or automatic speech recognition alters the type of speaker group information retained, with ASR fine-tuning discarding phonetic variations while keeping semantic ones. The research suggests these findings could aid in developing fairer ASR algorithms. AI

IMPACT Findings could lead to more equitable ASR systems by understanding how models encode sensitive demographic data.

RANK_REASON The cluster contains an academic paper detailing research findings on AI models.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Felix Herron, Solange Rossato Alexandre Allauzen, Benoit Favre, Fran\c{c}ois Portet ·

    Speaker Group Encoding in Self-supervised Speech Recognition Models

    arXiv:2606.10654v1 Announce Type: new Abstract: We investigate what self-supervised speech recognition models (S3Ms) learn about speaker groups (SGs). We examine several states of S3Ms: pretrained, finetuned on speaker identification (SID), finetuned on automatic speech recogniti…

  2. arXiv cs.CL TIER_1 English(EN) · François Portet ·

    Speaker Group Encoding in Self-supervised Speech Recognition Models

    We investigate what self-supervised speech recognition models (S3Ms) learn about speaker groups (SGs). We examine several states of S3Ms: pretrained, finetuned on speaker identification (SID), finetuned on automatic speech recognition (ASR), and ASR-finetuned using a fairness enh…