New dataset and benchmark launched for audio-visual video editing

By PulseAugur Editorial · [1 sources] · 2026-06-03 04:00

Researchers have introduced JAVEdit-100k, a new large-scale dataset designed for instruction-guided joint audio-visual video editing. The dataset contains approximately 100,000 editing triplets across five categories, created using a novel generation pipeline with agent-in-the-loop quality control. To standardize evaluation, they also developed JAVEditBench, a comprehensive benchmark, and proposed JAVEdit, a baseline model that demonstrated superior performance on multiple metrics. AI

IMPACT Enables more sophisticated AI-driven video editing by providing dedicated resources for audio-visual synchronization.

RANK_REASON The cluster contains a new academic paper introducing a dataset, benchmark, and baseline model. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Yinan Chen, Chuming Lin, Zhennan Chen, Yuxiang Zeng, Junwei Zhu, Yali Bi, Xijie Huang, Chengming Xu, Donghao Luo, Zhucun Xue, Xiaobin Hu, Chengjie Wang, Yong Liu, Jiangning Zhang, Shuicheng Yan · 2026-06-03 04:00

JAVEDIT: Joint Audio-Visual Instruction-Guided Video Editing with Agentic Data Curation

arXiv:2606.03168v1 Announce Type: new Abstract: While instruction-based video editing has seen significant progress, joint audio-visual editing remains constrained by the absence of dedicated datasets and benchmarks. To bridge this gap, we present JAVEdit-100k, the first large-sc…

COVERAGE [1]

JAVEDIT: Joint Audio-Visual Instruction-Guided Video Editing with Agentic Data Curation

RELATED TOPICS