SEGA method enhances diffusion transformer image generation resolution

By PulseAugur Editorial · [3 sources] · 2026-05-21 00:00

Researchers have developed SEGA, a novel training-free method to improve the resolution extrapolation capabilities of diffusion transformers used in text-to-image generation. SEGA adaptively scales attention across different frequency components of the latent representation during the denoising process. This approach enhances both the structural coherence and the fine-detail fidelity of generated images at higher resolutions compared to existing methods. AI

IMPACT Improves image generation quality at higher resolutions for diffusion transformer models.

RANK_REASON The cluster contains an academic paper detailing a new method for improving diffusion transformer performance.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

SEGA method enhances diffusion transformer image generation resolution

COVERAGE [3]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-21 00:00

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

SEGA improves high-resolution text-to-image generation by adaptively scaling attention across RoPE components based on spatial-frequency structure during denoising steps.
arXiv cs.CV TIER_1 English(EN) · Javad Rajabi, Kimia Shaban, Koorosh Roohi, David B. Lindell, Babak Taati · 2026-05-22 04:00

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

arXiv:2605.22668v1 Announce Type: new Abstract: Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate t…
arXiv cs.CV TIER_1 English(EN) · Babak Taati · 2026-05-21 16:09

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavi…

COVERAGE [3]

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

RELATED ENTITIES

RELATED TOPICS