PulseAugur
实时 02:06:34
English(EN) VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents

新的基准和模型推动了AI图像编辑能力的进步

研究人员推出了新的基准和数据集来评估图像编辑模型,解决了当前系统的局限性。VINS-120K 为超高分辨率图像编辑提供了大规模数据集,而 VDE Bench 则专注于修改包含多种语言密集文本的视觉文档。另一个基准 VIBE 评估了模型遵循视觉指令的能力,结果显示专有模型目前优于开源替代品,但在复杂任务方面仍有困难。此外,Together AI 推出了 FLUX.1 Kontext 模型,该模型无需微调即可使用文本和图像提示进行上下文图像生成和编辑。 AI

影响 新的基准和模型正在推动AI图像编辑的界限,实现更精确的控制和更高的分辨率。

排序理由 该集群包含多篇介绍图像编辑新基准和数据集的研究论文,以及新模型的产品发布。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

报道来源 [6]

  1. arXiv cs.CV TIER_1 English(EN) · Zhizhou Chen, Shanyan Guan, Zhanxin Gao, En Ci, Yanhao Ge, Wei Li, Zhenyu Zhang, Jian Yang, Ying Tai ·

    VINS-120K:使用大规模数据集进行超高分辨率图像编辑

    arXiv:2605.23518v1 Announce Type: new Abstract: Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-…

  2. arXiv cs.CV TIER_1 English(EN) · Dian Zheng, Manyuan Zhang, Hongyu Li, Hongbo Liu, Kai Zou, Kaituo Feng, Hongsheng Li ·

    Uni-Edit:智能编辑是统一模型微调的通用任务

    arXiv:2605.21487v2 Announce Type: replace Abstract: Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex mult…

  3. arXiv cs.CV TIER_1 English(EN) · Ying Tai ·

    VINS-120K:使用大规模数据集进行超高分辨率图像编辑

    Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-scale dataset for instruction-based UHR image ed…

  4. arXiv cs.CV TIER_1 English(EN) · Hongzhu Yi, Yujia Yang, Yuanxiang Wang, Tong Li, Zhenyu Guan, Tianyu Zong, Jiahuan Chen, Chenxi Bao, Tiankun Yang, Haopeng Jin, Yixuan Yuan, Xinming Wang, Tao Yu, Ruilin Gao, Ruiwen Tao, Haijin Liang, Jin Ma, Jinwen Luo, Yeshani, Xinyu Zuo, Jungang Xu ·

    VDE基准:评估图像编辑模型修改视觉文档的能力

    arXiv:2602.00122v2 Announce Type: replace Abstract: In recent years, image editing models have made significant progress, enabling users to manipulate visual content in a flexible and interactive manner through natural language instructions. However, an important yet underexplore…

  5. arXiv cs.CV TIER_1 English(EN) · Huanyu Zhang, Xuehai Bai, Chengzu Li, Chen Liang, Haochen Tian, Haodong Li, Ruichuan An, Yifan Zhang, Anna Korhonen, Zhang Zhang, Liang Wang, Tieniu Tan ·

    模型在多大程度上能遵循视觉指令?VIBE:一个用于视觉指令驱动图像编辑的系统性基准

    arXiv:2602.01851v2 Announce Type: replace Abstract: Recent generative models have achieved remarkable progress in image editing. However, existing systems and benchmarks remain largely text-guided. In contrast, human communication is inherently multimodal, where visual instructio…

  6. Together AI blog TIER_1 English(EN) ·

    FLUX.1 Kontext 模型:无需微调即可实现角色一致性和精确图像编辑