WavLM
PulseAugur coverage of WavLM — every cluster mentioning WavLM across labs, papers, and developer communities, ranked by signal.
4 day(s) with sentiment data
-
Speech models encode child age/gender in early layers, study finds
Researchers have analyzed how well self-supervised learning (SSL) models capture age and gender information in children's speech. The study focused on four models: Wav2Vec2, HuBERT, Data2Vec, and WavLM, examining their …
-
New toolkits simplify syllable-level speech tokenization for AI models
Two new research papers introduce novel toolkits for syllable-level speech tokenization, aiming to improve spoken language modeling. The first, "findsylls," offers a language-agnostic toolkit that unifies various syllab…
-
WavSLM simplifies speech generation with distilled WavLM representations
Researchers have developed WavSLM, a novel speech language model that simplifies the generation of coherent speech by distilling self-supervised WavLM representations into a single codebook. This approach allows WavSLM …
-
New discrete optimal transport attack targets speaker verification systems
Researchers have developed a novel adversarial attack method using discrete optimal transport (DOT) that targets automatic speaker verification (ASV) and anti-spoofing systems. This black-box attack operates by aligning…
-
ASR models advance with new architectures and vast supervised data
The field of Automatic Speech Recognition (ASR) is seeing rapid advancements driven by two primary factors: the increasing availability of pseudo-labeled data and the emergence of new model architectures. While models l…
-
New method explains deepfake speech detector decisions
Researchers have developed a new method to understand how deepfake speech detectors make their decisions. By using Integrated Gradients on self-supervised representations, the technique can pinpoint specific moments in …
-
New voice conversion method uses KNN for non-parallel data
Researchers have developed a novel voice conversion framework that uses K-Nearest Neighbors (KNN) retrieval on WavLM representations to align non-parallel speech data. This method constructs synthetic training pairs fro…
-
New framework improves speech confidence detection using Whisper
Researchers have developed a new semi-supervised framework for detecting speaker confidence in speech, addressing the challenge of limited labeled data. This approach combines deep semantic embeddings from OpenAI's Whis…
-
WavCube model unifies speech understanding and generation with compressed representation
Researchers have developed WavCube, a novel speech representation model designed to unify speech understanding and generation tasks. This model utilizes a compact continuous latent space derived from a self-supervised l…
-
Phoneme-level analysis improves detection of emotionally manipulated synthetic speech
Researchers have developed a new method for detecting deepfake audio by analyzing speech at the phoneme level. This approach, which uses self-supervised embeddings, proved more effective than previous methods that treat…
-
Researchers explore quantum and deep learning for audio deepfake detection
Two research papers submitted to the Environment-Aware Speech and Sound Deepfake Detection Challenge (ESDD2) in 2026 propose novel deep-learning frameworks for detecting manipulated audio. The first paper introduces a d…
-
New GRIDS framework detects anomalies in self-supervised speech models
Researchers have developed a new framework called GRIDS to analyze how perturbations affect the internal representations of self-supervised speech models. By using Local Intrinsic Dimensionality (LID), the framework can…
-
LASE model improves cross-script voice cloning by making embeddings language-uninformative
Researchers have developed LASE, a Language-Adversarial Speaker Encoder, to improve multilingual voice cloning. Standard encoders struggle to maintain speaker identity across different scripts, particularly when project…