PulseAugur
实时 18:58:06
English(EN) A systematic evaluation of vision-language models for observational astronomical reasoning tasks

视觉-语言模型在天文学推理任务中表现不一

研究人员开发了AstroVLBench,这是一个旨在系统性评估视觉-语言模型(VLMs)在观测天文学任务上表现的新基准。该基准包含跨越五种不同天文数据模态的4100多个实例。对六个领先模型的评估显示,性能表现因数据类型而异,Gemini 3 Pro表现出最一致的能力,尽管所有模型都逊色于专业方法。 AI

影响 为VLMs在天文学领域的基线性能奠定了基础,突显了当前在科学应用中进行基础和推理的局限性。

排序理由 这是一篇介绍用于评估AI模型在科学任务上表现的新基准的研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

视觉-语言模型在天文学推理任务中表现不一

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Wenke Ren, Hengxiao Guo, Wenwen Zuo, Xiaoman Zhang ·

    A systematic evaluation of vision-language models for observational astronomical reasoning tasks

    arXiv:2604.24589v1 Announce Type: new Abstract: Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities remains untested. We present Astr…

  2. arXiv cs.AI TIER_1 English(EN) · Xiaoman Zhang ·

    A systematic evaluation of vision-language models for observational astronomical reasoning tasks

    Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities remains untested. We present AstroVLBench, a comprehensive benchmark comprising o…