Baidu has introduced ERNIE-Image, an open-source text-to-image generation model based on an 8B single-stream DiT architecture. The model aims to compete with closed-source systems by enhancing data pre-training and supervision quality. ERNIE-Image utilizes a multi-stage data construction pipeline, including fine-grained categorization, detailed captioning, and aesthetic assessment, to improve its foundation for complex generation tasks. Additionally, a lightweight Prompt Enhancer and an industrial-grade aesthetic model are provided to facilitate practical use and evaluation. AI
IMPACT This open-source release provides a strong foundation for text-to-image generation, potentially accelerating research and development in the AIGC community.
RANK_REASON The cluster contains a technical report detailing a new open-source model release. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →