PixelDiT — 1.3B pixel-space diffusion transformer, no VAE, 4GB VRAM, now 100% diffusers compatible with Qwen encoder support
A new diffusion transformer model called PixelDiT has been released, featuring 1.3 billion parameters and operating directly in pixel space without a VAE. This model is designed to be efficient, requiring only 4GB of VRAM, and is fully compatible with the Hugging Face Diffusers library. It also incorporates support for the Qwen encoder, enhancing its capabilities. AI
IMPACT Provides a new, efficient diffusion model for image generation tasks.