PulseAugur
EN
LIVE 05:23:18
中文(ZH) GAIR Paper 107|高校联合腾讯发布 GameCraft-Bench:AI已能端到端开发游戏,Claude Opus 四成达到可玩水平

AI game generation benchmark reveals top models struggle with playable game creation

Researchers from CUHK-Shenzhen, Shenzhen Institute of Technology, and Tencent have introduced GameCraft-Bench, a new benchmark designed to evaluate AI's ability to generate fully playable games. Unlike previous benchmarks that focused on static code or simpler web games, GameCraft-Bench utilizes the Godot 4 engine to assess end-to-end game development, including script writing, scene configuration, and asset integration. The benchmark incorporates a multimodal model to evaluate the dynamic interactions and visual feedback of generated games, revealing that even top-tier AI models struggle with complex interactive system generation, scoring below 50% on average. AI

IMPACT Highlights significant limitations in current AI's ability to create complex, interactive systems, indicating a need for advancements beyond basic code generation.

RANK_REASON Introduction of a new benchmark for AI game generation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on 雷峰网 (Leiphone) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

AI game generation benchmark reveals top models struggle with playable game creation

COVERAGE [1]

  1. 雷峰网 (Leiphone) TIER_1 中文(ZH) ·

    GAIR Paper 107 | Universities and Tencent Jointly Release GameCraft-Bench: AI Can Now End-to-End Develop Games, Claude Opus Reaches 40% Playable Level

    <section style="text-align: center; margin: 0px 16px; line-height: 1.75em; display: block;"><img class="rich_pages wxw-img" src="https://static.leiphone.com/uploads/new/images/20260626/6a3df340366e7.jpg?imageMogr2/quality/90" style="width: 100%; display: inline-block; text-align:…