Researchers have introduced DisciplineGen-1M, a large-scale dataset designed to improve the accuracy of AI models in generating and editing knowledge-intensive visual content. This dataset comprises 1.2 million samples across ten disciplines, including mathematics, physics, and computer science, and was constructed using a framework that combines vector-graphics rendering, OCR-based editing, and programmatic synthesis. Experiments using this dataset have shown significant improvements on discipline-specific benchmarks like GenExam and GRADE, suggesting that structured academic visual data is crucial for advancing AI capabilities beyond aesthetic appeal to verifiable, knowledge-grounded visual creation. AI
IMPACT Enhances AI's ability to generate and edit accurate, knowledge-grounded visual content across academic disciplines.
RANK_REASON The item is an academic paper introducing a new dataset and model. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX
- DagsHub
- DisciplineGen-1M
- GenExam
- Gotit.pub
- GRADE
- Hugging Face
- RISE
- ScienceCast
- WISE
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →