New benchmarks and models advance AI image editing capabilities
ByPulseAugur Editorial·[12 sources]·
Researchers have introduced new benchmarks and datasets for evaluating image editing models, addressing limitations in current systems. VINS-120K offers a large-scale dataset for ultra-high-resolution image editing, while VDE Bench focuses on modifying visual documents with dense text in multiple languages. VIBE, another benchmark, assesses models' ability to follow visual instructions, revealing that proprietary models currently outperform open-source alternatives but still struggle with complex tasks. Additionally, Together AI has launched FLUX.1 Kontext models, which enable in-context image generation and editing using both text and image prompts without requiring fine-tuning.
AI
IMPACT
New benchmarks and models are pushing the boundaries of AI image editing, enabling more precise control and higher resolutions.
RANK_REASON
The cluster contains multiple research papers introducing new benchmarks and datasets for image editing, alongside a product launch of new models.
arXiv:2604.08213v2 Announce Type: replace-cross Abstract: High-quality source-target image pairs with precise editing instructions are essential for instruction-guided image editing, yet constructing such training triplets at scale remains costly. Recent pipelines often rely on v…
arXiv:2604.18170v2 Announce Type: replace-cross Abstract: LLMs edit text and code by autoregressively regenerating the full output, even when most tokens appear verbatim in the input. We study Copy-as-Decode, a decoding-layer mechanism that recasts edit generation as structured d…
arXiv cs.CV
TIER_1English(EN)·Yuke Li, Lianli Gao, Ji Zhang, Pengpeng Zeng, Lichuan Xiang, Hongkai Wen, Heng Tao Shen, Jingkuan Song·
arXiv:2512.01382v4 Announce Type: replace Abstract: Exemplar-guided Image Editing (EIE) aims to modify a source image according to a visual reference. Existing approaches often require large-scale pre-training to learn relationships between the source and reference images, incurr…
arXiv:2605.24805v1 Announce Type: new Abstract: Large-scale controllable 3D assets are critical for computer graphics, embodied AI, robotics, and interactive content creation, yet creating diverse 3D assets remains challenging due to the high cost of manual modeling and rigging. …
arXiv cs.CV
TIER_1English(EN)·Yuanye Liu, Siyuan Zhou, Ke Zhang, Lei Li, Wei Chen, Xiahai Zhuang·
arXiv:2605.24932v1 Announce Type: new Abstract: Pre-trained Vision Transformers (ViTs) are increasingly deployed for medical image classification. However, correcting their inevitable failure cases in dynamic clinical scenarios poses a critical challenge. Conventional fine-tuning…
arXiv cs.CV
TIER_1English(EN)·Mingyi Xu, Jinpeng Lin, Min Zhou, Tiezheng Ge, Ming Zeng·
arXiv:2605.25568v1 Announce Type: new Abstract: Scribble-guided image editing allows users to combine simple scribble annotations with text prompts to specify both where and how an image should be edited, enabling flexible interaction with precise spatial control. However, existi…
arXiv:2605.23518v1 Announce Type: new Abstract: Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-…
Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-scale dataset for instruction-based UHR image ed…
arXiv:2602.01851v2 Announce Type: replace Abstract: Recent generative models have achieved remarkable progress in image editing. However, existing systems and benchmarks remain largely text-guided. In contrast, human communication is inherently multimodal, where visual instructio…
arXiv:2602.00122v2 Announce Type: replace Abstract: In recent years, image editing models have made significant progress, enabling users to manipulate visual content in a flexible and interactive manner through natural language instructions. However, an important yet underexplore…