Researchers have developed a novel step-by-step method for generating audio from video (V2A) that offers enhanced control and realism. This approach mimics traditional Foley workflows by allowing for the incremental creation of complementary sounds, enabling users to author multiple sound events triggered by video content. To reduce reliance on expensive multi-reference datasets, each generation step employs negative audio guidance, discouraging the duplication of existing sounds. The system is trained using standard single-reference audiovisual datasets by finetuning a pre-trained V2A model to leverage acoustic context while remaining visually grounded. AI
IMPACT This V2A synthesis method could enable more sophisticated and controllable audio post-production for video content.
RANK_REASON Academic paper detailing a new method for video-to-audio synthesis. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →