Researchers have introduced Goku, a new dataset and benchmark designed for instruction-based video editing. Goku comprises 2 million video editing pairs, expanding beyond simple appearance edits to include complex multi-task and structural manipulations. The accompanying Goku-Edit model, which utilizes a multimodal large language model for instruction comprehension, demonstrates an improvement of up to 8% over existing open-source models on the newly proposed Goku-Bench benchmark. AI
IMPACT Advances capabilities in instruction-based video editing, potentially enabling more complex and creative video manipulation tools.
RANK_REASON The cluster describes a new academic paper introducing a dataset, benchmark, and model for video editing.
Read on Hugging Face Daily Papers →
- alphaXiv
- arXiv
- CatalyzeX
- DagsHub
- Goku
- Goku-Bench
- Goku-Edit
- Gotit.pub
- Hugging Face
- multimodal large language model
- ScienceCast
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →