Researchers from CUHK-Shenzhen, Shenzhen Institute of Technology, and Tencent have introduced GameCraft-Bench, a new benchmark designed to evaluate AI's ability to generate fully playable games. Unlike previous benchmarks that focused on static code or simpler web games, GameCraft-Bench utilizes the Godot 4 engine to assess end-to-end game development, including script writing, scene configuration, and asset integration. The benchmark incorporates a multimodal model to evaluate the dynamic interactions and visual feedback of generated games, revealing that even top-tier AI models struggle with complex interactive system generation, scoring below 50% on average. AI
IMPACT Highlights significant limitations in current AI's ability to create complex, interactive systems, indicating a need for advancements beyond basic code generation.
RANK_REASON Introduction of a new benchmark for AI game generation. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →