English(EN) Beyond Screenshots: Evaluating VLMs' Understanding of UI Animations

VLMs难以解读UI动画，新数据集揭示

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 22:15

研究人员开发了AniMINT，一个包含300个带注释的UI动画视频的新数据集，用于评估视觉语言模型（VLMs）对动态界面的理解程度。目前的VLMs可以检测UI动画中的基本运动，但在解释其目的和含义方面存在困难，与人类相比表现出显著的性能差距。该研究确定了VLM在运动、上下文和感知线索方面的关键性能瓶颈，为未来提高VLM在UI交互方面的能力指明了方向。 AI

影响突出了当前VLMs在理解动态UI元素方面的局限性，为未来多模态AI在界面代理方面的研究提供了指导。

排序理由学术论文，介绍了用于VLMs的新数据集和评估方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Chen Liang, Xirui Jiang, Naihao Deng, Eytan Adar, Anhong Guo · 2026-04-30 04:00

Beyond Screenshots: Evaluating VLMs' Understanding of UI Animations

arXiv:2604.26148v1 Announce Type: cross Abstract: AI agents operating on user interfaces must understand how interfaces communicate state and feedback to act reliably. As a core communicative modality, animations are increasingly used in modern interfaces, serving critical functi…
arXiv cs.CL TIER_1 English(EN) · Anhong Guo · 2026-04-28 22:15

Beyond Screenshots: Evaluating VLMs' Understanding of UI Animations

AI agents operating on user interfaces must understand how interfaces communicate state and feedback to act reliably. As a core communicative modality, animations are increasingly used in modern interfaces, serving critical functional purposes beyond mere aesthetics. Thus, unders…

报道来源 [2]

Beyond Screenshots: Evaluating VLMs' Understanding of UI Animations

Beyond Screenshots: Evaluating VLMs' Understanding of UI Animations

相关实体

相关话题