PulseAugur
EN
LIVE 01:14:17

New benchmarks and models advance AI image editing capabilities

Researchers have introduced new benchmarks and datasets for evaluating image editing models, addressing limitations in current systems. VINS-120K offers a large-scale dataset for ultra-high-resolution image editing, while VDE Bench focuses on modifying visual documents with dense text in multiple languages. VIBE, another benchmark, assesses models' ability to follow visual instructions, revealing that proprietary models currently outperform open-source alternatives but still struggle with complex tasks. Additionally, Together AI has launched FLUX.1 Kontext models, which enable in-context image generation and editing using both text and image prompts without requiring fine-tuning. AI

IMPACT New benchmarks and models are pushing the boundaries of AI image editing, enabling more precise control and higher resolutions.

RANK_REASON The cluster contains multiple research papers introducing new benchmarks and datasets for image editing, alongside a product launch of new models.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 6 sources. How we write summaries →

COVERAGE [6]

  1. arXiv cs.CV TIER_1 English(EN) · Zhizhou Chen, Shanyan Guan, Zhanxin Gao, En Ci, Yanhao Ge, Wei Li, Zhenyu Zhang, Jian Yang, Ying Tai ·

    VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

    arXiv:2605.23518v1 Announce Type: new Abstract: Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-…

  2. arXiv cs.CV TIER_1 English(EN) · Dian Zheng, Manyuan Zhang, Hongyu Li, Hongbo Liu, Kai Zou, Kaituo Feng, Hongsheng Li ·

    Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

    arXiv:2605.21487v2 Announce Type: replace Abstract: Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex mult…

  3. arXiv cs.CV TIER_1 English(EN) · Ying Tai ·

    VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

    Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-scale dataset for instruction-based UHR image ed…

  4. arXiv cs.CV TIER_1 English(EN) · Hongzhu Yi, Yujia Yang, Yuanxiang Wang, Tong Li, Zhenyu Guan, Tianyu Zong, Jiahuan Chen, Chenxi Bao, Tiankun Yang, Haopeng Jin, Yixuan Yuan, Xinming Wang, Tao Yu, Ruilin Gao, Ruiwen Tao, Haijin Liang, Jin Ma, Jinwen Luo, Yeshani, Xinyu Zuo, Jungang Xu ·

    VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents

    arXiv:2602.00122v2 Announce Type: replace Abstract: In recent years, image editing models have made significant progress, enabling users to manipulate visual content in a flexible and interactive manner through natural language instructions. However, an important yet underexplore…

  5. arXiv cs.CV TIER_1 English(EN) · Huanyu Zhang, Xuehai Bai, Chengzu Li, Chen Liang, Haochen Tian, Haodong Li, Ruichuan An, Yifan Zhang, Anna Korhonen, Zhang Zhang, Liang Wang, Tieniu Tan ·

    How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing

    arXiv:2602.01851v2 Announce Type: replace Abstract: Recent generative models have achieved remarkable progress in image editing. However, existing systems and benchmarks remain largely text-guided. In contrast, human communication is inherently multimodal, where visual instructio…

  6. Together AI blog TIER_1 English(EN) ·

    FLUX.1 Kontext models: Character consistency and precise image editing without fine-tuning