English(EN) TurtleAI: Benchmarking Multimodal Models for Visual Programming in Turtle Graphics

新基准显示视觉语言模型在视觉编程任务上存在困难

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-02 13:25

研究人员推出了 TurtleAI，这是一个旨在评估视觉语言模型（VLMs）在海龟图形（Turtle Graphics）教育性视觉编程任务上的新基准。该基准包含 823 个任务，结果显示包括 GPT-5 和 GPT-4o 在内的 20 多个领先的 VLM 在此方面存在显著困难，成功率通常低于 30%。提出的数据生成技术和对 Qwen2-VL-72B 的微调在真实任务上显示出约 20% 的显著改进，突显了模型在空间推理和精确视觉复制方面的挑战。 AI

影响强调了当前 VLM 在教育性视觉编程方面的局限性，并指出了未来模型开发的领域。

排序理由该集群包含一篇介绍用于评估 AI 模型的新基准的研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Chao Wen, Jacqueline Staub, Adish Singla · 2026-06-03 04:00

TurtleAI：为Turtle Graphics中的视觉编程进行多模态模型基准测试

arXiv:2606.03626v1 Announce Type: cross Abstract: Vision-language models (VLMs) have been explored for visual programming, where they generate code to solve visual tasks. However, most prior work focuses on visual programming for productivity; it remains unclear how well current …
arXiv cs.AI TIER_1 English(EN) · Adish Singla · 2026-06-02 13:25

TurtleAI：为Turtle Graphics中的视觉编程进行多模态模型基准测试

Vision-language models (VLMs) have been explored for visual programming, where they generate code to solve visual tasks. However, most prior work focuses on visual programming for productivity; it remains unclear how well current VLMs perform on education-oriented visual programm…

报道来源 [2]

TurtleAI：为Turtle Graphics中的视觉编程进行多模态模型基准测试

TurtleAI：为Turtle Graphics中的视觉编程进行多模态模型基准测试

相关实体

相关话题