Researchers have developed SSNAPS, a novel unsupervised method for separating speech from background noise using audio-visual cues. The approach employs diffusion inverse sampling, modeling clean speech and ambient noise with distinct diffusion priors to reconstruct all sources. This technique demonstrates superior performance compared to supervised baselines in word error rate across various noisy conditions, even handling multiple speakers and off-screen separation. The high fidelity of the separated noise component also enables downstream acoustic scene detection. AI
RANK_REASON The cluster contains a research paper detailing a new method for audio-visual speech separation. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX
- DagsHub
- Diffusion Inverse Sampling
- Gotit.pub
- Hugging Face
- Influence Flower
- ScienceCast
- SSNAPS
- Yochai Yemini
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →