Researchers have introduced JAMER, a new dataset and benchmark designed to evaluate AI models on project-level code generation within professional game engines. Utilizing data from game jam competitions, JAMER focuses on the Godot engine and includes 8,133 verified projects. The benchmark assesses tasks like theme-driven generation and code completion using metrics such as compilation pass rates, Structural Completeness Score, and Behavioral Alignment Score. Initial evaluations show a significant drop in AI model performance as project complexity increases, highlighting architectural design as a key bottleneck. AI
IMPACT Highlights limitations in current AI code generation for complex project-level tasks, particularly in game development.
RANK_REASON The cluster describes a new dataset and benchmark for AI code generation, presented in an arXiv paper.
- arXiv
- Behavioral Alignment Score
- code agents
- Godot
- JamBench
- Jamer
- Structural Completeness Score
- Task2a
- game jam
- Hugging Face
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →