Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 3d · [5 sources]

Robust Spoofed Speech Detection via Temporal Pyramid Modeling

Researchers are developing advanced methods to detect spoofed speech, a growing challenge due to realistic synthesis and voice conversion technologies. One approach, the Temporal Pyramid Adapter, uses parallel temporal convolutions with varying receptive fields to capture multi-scale spoofing cues, integrating self-supervised representations like XLS-R. Another study introduces ArFake, the first multi-dialect Arabic spoofed speech dataset, to address the limited research in this area. A third paper transforms self-supervised speech models into Mixture-of-Experts architectures to enhance generalization and robustness against unseen synthesis methods, showing a significant relative improvement in error reduction. AI

arXiv
Mixture-of-Experts
DagsHub
alphaXiv
Hugging Face
ASVspoof 2021
Temporal Pyramid Adapter
ASVspoof 2017
PartialSpoof
DiffSSD
HQ-MPSD
LCNN-BLSTM
ArFake
FishSpeech
RawNet2
Arabic
TRACE
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Mel Frequency Cepstral Coefficients