PulseAugur
实时 06:53:22
English(EN) QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

新的QCalEval基准测试对量子校准图上的视觉-语言模型进行评估

研究人员推出了QCalEval,这是一个新的基准测试,旨在评估视觉-语言模型(VLMs)在多大程度上能够理解量子计算校准图。该基准测试包含243个样本,涵盖了各种量子计算实验类型,并使用零样本和上下文学习方法进行评估。初步结果表明,虽然前沿的闭源模型表现良好,但许多开放权重模型在多图像上下文学习方面存在困难,并且仅通过监督微调并不能完全弥合这一差距。 AI

影响 为科学领域VLMs的评估建立了一个新标准,可能指导未来针对专业数据解释的模型开发。

排序理由 这是一篇介绍用于评估VLMs在特定科学任务上表现的新基准测试的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新的QCalEval基准测试对量子校准图上的视觉-语言模型进行评估

报道来源 [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

    Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEv…

  2. arXiv cs.CV TIER_1 English(EN) · Shuxiang Cao, Zijian Zhang, Abhishek Agarwal, Grace Bratrud, Niyaz R. Beysengulov, Daniel C. Cole, Alejandro G\'omez Frieiro, Elena O. Glen, Hao Hsu, Gang Huang, Raymond Jow, Greshma Shaji, Tom Lubowe, Ligeng Zhu, Luis Mantilla Calder\'on, Nicola Pancotti ·

    QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

    arXiv:2604.25884v1 Announce Type: cross Abstract: Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language…

  3. arXiv cs.CV TIER_1 English(EN) · Krysta Svore ·

    QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

    Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEv…