Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

Linguistically Augmented Audio Speech Data (LinguAS)

Researchers have introduced Linguistically Augmented Audio Speech Data (LinguAS), a new dataset designed to combat the rise of deepfaked audio. LinguAS includes over 800 audio samples, both genuine and fake, annotated with five linguistic features that are characteristic of natural human speech. By incorporating these linguistic cues alongside audio features, models trained on LinguAS demonstrated significantly improved performance in detecting audio deepfakes compared to existing baselines. AI

IMPACT Improves AI's ability to detect sophisticated audio deepfakes by incorporating linguistic analysis.

HuBert
XLSR
ASVspoof 2021
deepfaked audio