English(EN)HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing
新研究通过产品一致性和效率提升图像编辑能力
作者PulseAugur 编辑部·[11 个来源]·
研究人员正在开发新的方法来改进基于指令的图像编辑,重点在于保持产品身份和提高效率。“ProductConsistency”项目引入了一个新的数据集和基准,以帮助模型保持产品特征和品牌,使Qwen-Image-Edit-2511模型的字符错误率降低了5倍。同时,“Moebius”框架提供了一种轻量级的图像修复解决方案,参数更少,推理速度更快,可与大型模型相媲美。“HiLo-Token”通过根据空间频率自适应地分配令牌来解决图像编辑中Diffusion Transformers的延迟问题,在不损失质量的情况下实现了显著的加速。此外,“Thinking in Boxes”提供了一个精确的3D编辑界面,使用3D框来控制变换,而“BindEdit”则解决了多对象编辑场景中的注意力泄露问题。
AI
arXiv:2606.19103v1 Announce Type: cross Abstract: Recent advances in instruction-based image editing have enabled models to perform complex visual edits from natural language instructions. However, in product-centric scenarios where preserving product features, branding, and text…
A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies.
A novel token compression framework called HiLo-Token is introduced to accelerate Diffusion Transformers in image editing tasks by adaptively allocating tokens based on spatial frequency and context importance, achieving significant speedups without quality loss.
arXiv cs.CV
TIER_1English(EN)·Pradhaan S Bhat, Naveen Chandra R, Rishubh Parihar, Vaibhav Vavilala, R. Venkatesh Babu, D. A. Forsyth, Anand Bhattad·
arXiv:2606.20556v1 Announce Type: new Abstract: Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing -- particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but on…
Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing -- particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but only as loose conditioning signals indicating appr…
While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structur…
Recent advances in instruction-based image editing have enabled models to perform complex visual edits from natural language instructions. However, in product-centric scenarios where preserving product features, branding, and textual elements are critical, current open and closed…
Real image editing enables precise manipulation of visual content, yet existing methods often fail in complex multi-object scenarios, causing semantic blending, object duplication, or incomplete edits. We attribute these failures to attention leakage, where signals across spatial…
arXiv cs.CV
TIER_1English(EN)·Yiwei Ma, Ke Ye, Weihuang Lin, Jiayi Ji, Xiaoshuai Sun, Tat-Seng Chua, Rongrong Ji·
arXiv:2606.15570v1 Announce Type: new Abstract: In recent years, there have been notable advancements in the area of instruction-based image editing (IIE), which focuses on the automatic alteration of input images using a model. Nevertheless, assessing the effectiveness of these …
arXiv cs.CV
TIER_1English(EN)·Minghan Li, Jeremy Moebel, Mengyu Wang·
arXiv:2606.14042v1 Announce Type: new Abstract: One-step image editing is important for making text-guided editing fast, practical, and easy to deploy, but its underlying mechanism is still not fully understood. We revisit ChordEdit through reproduction, ablation, and simplificat…
One-step image editing is important for making text-guided editing fast, practical, and easy to deploy, but its underlying mechanism is still not fully understood. We revisit ChordEdit through reproduction, ablation, and simplification. Our analysis shows that a) the chord window…