PulseAugur
实时 23:45:46
English(EN) VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents

新的基准和模型推动了AI图像编辑能力的进步

研究人员推出了新的基准和数据集来评估图像编辑模型,解决了当前系统的局限性。VINS-120K 为超高分辨率图像编辑提供了大规模数据集,而 VDE Bench 则专注于修改包含多种语言密集文本的视觉文档。另一个基准 VIBE 评估了模型遵循视觉指令的能力,结果显示专有模型目前优于开源替代品,但在复杂任务方面仍有困难。此外,Together AI 推出了 FLUX.1 Kontext 模型,该模型无需微调即可使用文本和图像提示进行上下文图像生成和编辑。 AI

影响 新的基准和模型正在推动AI图像编辑的界限,实现更精确的控制和更高的分辨率。

排序理由 该集群包含多篇介绍图像编辑新基准和数据集的研究论文,以及新模型的产品发布。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

报道来源 [6]

  1. arXiv cs.CV TIER_1 English(EN) · Zhizhou Chen, Shanyan Guan, Zhanxin Gao, En Ci, Yanhao Ge, Wei Li, Zhenyu Zhang, Jian Yang, Ying Tai ·

    VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

    arXiv:2605.23518v1 Announce Type: new Abstract: Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-…

  2. arXiv cs.CV TIER_1 English(EN) · Dian Zheng, Manyuan Zhang, Hongyu Li, Hongbo Liu, Kai Zou, Kaituo Feng, Hongsheng Li ·

    Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

    arXiv:2605.21487v2 Announce Type: replace Abstract: Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex mult…

  3. arXiv cs.CV TIER_1 English(EN) · Ying Tai ·

    VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

    Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-scale dataset for instruction-based UHR image ed…

  4. arXiv cs.CV TIER_1 English(EN) · Hongzhu Yi, Yujia Yang, Yuanxiang Wang, Tong Li, Zhenyu Guan, Tianyu Zong, Jiahuan Chen, Chenxi Bao, Tiankun Yang, Haopeng Jin, Yixuan Yuan, Xinming Wang, Tao Yu, Ruilin Gao, Ruiwen Tao, Haijin Liang, Jin Ma, Jinwen Luo, Yeshani, Xinyu Zuo, Jungang Xu ·

    VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents

    arXiv:2602.00122v2 Announce Type: replace Abstract: In recent years, image editing models have made significant progress, enabling users to manipulate visual content in a flexible and interactive manner through natural language instructions. However, an important yet underexplore…

  5. arXiv cs.CV TIER_1 English(EN) · Huanyu Zhang, Xuehai Bai, Chengzu Li, Chen Liang, Haochen Tian, Haodong Li, Ruichuan An, Yifan Zhang, Anna Korhonen, Zhang Zhang, Liang Wang, Tieniu Tan ·

    How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing

    arXiv:2602.01851v2 Announce Type: replace Abstract: Recent generative models have achieved remarkable progress in image editing. However, existing systems and benchmarks remain largely text-guided. In contrast, human communication is inherently multimodal, where visual instructio…

  6. Together AI blog TIER_1 English(EN) ·

    FLUX.1 Kontext models: Character consistency and precise image editing without fine-tuning