PulseAugur
EN
LIVE 10:15:58

OmniGen-AR framework enables versatile image generation from multiple inputs

Researchers have introduced OmniGen-AR, a novel autoregressive framework designed for versatile image generation. This unified model can synthesize images from various inputs, including text, segmentation maps, depth information, and even existing images for editing or video prediction. To prevent condition tokens from influencing content tokens, the framework employs Disentangled Causal Attention (DCA), a technique that separates attention mechanisms during training. OmniGen-AR has demonstrated state-of-the-art performance on benchmarks like GenEval and VBench. AI

IMPACT Introduces a unified framework for multi-modal image generation, potentially simplifying complex visual synthesis tasks.

RANK_REASON This is a research paper describing a new model and method.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CV TIER_1 English(EN) · Junke Wang, Xun Wang, Qiushan Guo, Peize Sun, Weilin Huang, Zuxuan Wu, Yu-Gang Jiang ·

    OmniGen-AR: AutoRegressive Any-to-Image Generation

    arXiv:2606.09156v1 Announce Type: new Abstract: Autoregressive (AR) models have demonstrated strong potential in visual generation, offering superior performance with simple architectures and optimization objectives. However, existing methods are typically limited to single-modal…

  2. arXiv cs.CV TIER_1 English(EN) · Yu-Gang Jiang ·

    OmniGen-AR: AutoRegressive Any-to-Image Generation

    Autoregressive (AR) models have demonstrated strong potential in visual generation, offering superior performance with simple architectures and optimization objectives. However, existing methods are typically limited to single-modality conditions, e.g., text, restricting their ap…