English(EN) HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

新研究通过产品一致性和效率提升图像编辑能力

作者 PulseAugur 编辑部 · [11 个来源] · 2026-06-11 00:00

研究人员正在开发新的方法来改进基于指令的图像编辑，重点在于保持产品身份和提高效率。“ProductConsistency”项目引入了一个新的数据集和基准，以帮助模型保持产品特征和品牌，使Qwen-Image-Edit-2511模型的字符错误率降低了5倍。同时，“Moebius”框架提供了一种轻量级的图像修复解决方案，参数更少，推理速度更快，可与大型模型相媲美。“HiLo-Token”通过根据空间频率自适应地分配令牌来解决图像编辑中Diffusion Transformers的延迟问题，在不损失质量的情况下实现了显著的加速。此外，“Thinking in Boxes”提供了一个精确的3D编辑界面，使用3D框来控制变换，而“BindEdit”则解决了多对象编辑场景中的注意力泄露问题。 AI

影响这些进展旨在使图像编辑更加精确、高效和易于访问，可能对创意产业和面向用户的AI应用产生影响。

排序理由多篇研究论文介绍了图像编辑任务的新方法和基准。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 11 个来源。我们如何撰写摘要 →

报道来源 [11]

arXiv cs.AI TIER_1 English(EN) · Mukund Khanna, Raj Singh Yadav, Kunal Singh · 2026-06-18 04:00

ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL

arXiv:2606.19103v1 Announce Type: cross Abstract: Recent advances in instruction-based image editing have enabled models to perform complex visual edits from natural language instructions. However, in product-centric scenarios where preserving product features, branding, and text…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-17 00:00

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-11 00:00

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

A novel token compression framework called HiLo-Token is introduced to accelerate Diffusion Transformers in image editing tasks by adaptively allocating tokens based on spatial frequency and context importance, achieving significant speedups without quality loss.
arXiv cs.CV TIER_1 English(EN) · Pradhaan S Bhat, Naveen Chandra R, Rishubh Parihar, Vaibhav Vavilala, R. Venkatesh Babu, D. A. Forsyth, Anand Bhattad · 2026-06-19 04:00

Thinking in Boxes: 3D Editing in Real Images Made Easy

arXiv:2606.20556v1 Announce Type: new Abstract: Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing -- particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but on…
arXiv cs.CV TIER_1 English(EN) · Anand Bhattad · 2026-06-18 17:59

Thinking in Boxes: 3D Editing in Real Images Made Easy

Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing -- particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but only as loose conditioning signals indicating appr…
arXiv cs.CV TIER_1 English(EN) · Xinggang Wang · 2026-06-17 15:35

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structur…
arXiv cs.CV TIER_1 English(EN) · Kunal Singh · 2026-06-17 14:16

ProductConsistency：通过SFT和RL改进基于指令的图像编辑中的产品身份保持

Recent advances in instruction-based image editing have enabled models to perform complex visual edits from natural language instructions. However, in product-centric scenarios where preserving product features, branding, and textual elements are critical, current open and closed…
arXiv cs.CV TIER_1 English(EN) · Kibeom Hong · 2026-06-17 10:32

BindEdit：驯服注意力泄露，实现精确多目标图像编辑

Real image editing enables precise manipulation of visual content, yet existing methods often fail in complex multi-object scenarios, causing semantic blending, object duplication, or incomplete edits. We attribute these failures to attention leakage, where signals across spatial…
arXiv cs.CV TIER_1 English(EN) · Yiwei Ma, Ke Ye, Weihuang Lin, Jiayi Ji, Xiaoshuai Sun, Tat-Seng Chua, Rongrong Ji · 2026-06-16 04:00

An Extensive Benchmark for Single-round and Multi-round Instruction-based Image Editing

arXiv:2606.15570v1 Announce Type: new Abstract: In recent years, there have been notable advancements in the area of instruction-based image editing (IIE), which focuses on the automatic alteration of input images using a model. Nevertheless, assessing the effectiveness of these …
arXiv cs.CV TIER_1 English(EN) · Minghan Li, Jeremy Moebel, Mengyu Wang · 2026-06-15 04:00

Rethinking One-Step Image Editing through ChordEdit: Reproduction, Simplification, and New Insights

arXiv:2606.14042v1 Announce Type: new Abstract: One-step image editing is important for making text-guided editing fast, practical, and easy to deploy, but its underlying mechanism is still not fully understood. We revisit ChordEdit through reproduction, ablation, and simplificat…
arXiv cs.CV TIER_1 English(EN) · Mengyu Wang · 2026-06-12 02:35

Rethinking One-Step Image Editing through ChordEdit: Reproduction, Simplification, and New Insights

One-step image editing is important for making text-guided editing fast, practical, and easy to deploy, but its underlying mechanism is still not fully understood. We revisit ChordEdit through reproduction, ablation, and simplification. Our analysis shows that a) the chord window…

报道来源 [11]

相关实体

相关话题