新基准和框架应对 AI 代理在网站生成和遥感任务中的局限性

作者 PulseAugur 编辑部 · [9 个来源] · 2026-04-27 04:34

研究人员推出了 InteractWeb-Bench，这是一个旨在评估多模态大语言模型（MLLMs）在网站生成任务中的新基准。该基准模拟了用户指令可能模糊或矛盾的真实世界条件，这种情况被称为“盲执行”。使用 InteractWeb-Bench 进行的实验表明，当前前沿的基于 MLLM 的代理在这些复杂场景中难以进行意图识别和自适应交互。该基准包含一个交互式环境，具有 Clarify、Implement、Verify 和 Submit 等操作，以促进迭代改进。 AI

影响新基准突显了当前多模态代理在网站生成方面的局限性，表明需要改进意图识别和交互能力。

排序理由这是一篇介绍用于评估多模态模型的新基准的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 9 个来源。我们如何撰写摘要 →

报道来源 [9]

arXiv cs.AI TIER_1 English(EN) · Qiyao Wang, Haoran Hu, Longze Chen, Hongbo Wang, Hamid Alinejad-Rokny, Yuan Lin, Min Yang · 2026-05-01 04:00

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

arXiv:2604.27419v1 Announce Type: new Abstract: With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assum…
arXiv cs.CL TIER_1 English(EN) · Min Yang · 2026-04-30 04:49

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, informat…
arXiv cs.CV TIER_1 English(EN) · Guanchun Wang, Chenxiao Wu, Xiangrong Zhang, Zelin Peng, Jianxun Lai, Tianyang Zhang, Xu Tang · 2026-04-30 04:00

Seeking Consensus: Geometric-Semantic On-the-Fly Recalibration for Open-Vocabulary Remote Sensing Semantic Segmentation

arXiv:2604.26221v1 Announce Type: new Abstract: Open-vocabulary semantic segmentation (OVSS) in remote sensing images is a promising task that employs textual descriptions for identifying undefined land cover categories. Despite notable advances, existing methods typically employ…
arXiv cs.CV TIER_1 English(EN) · Hengtong Shen, Li Yan, Hong Xie, Yaxuan Wei, Xinhao Li, Wenfei Shen, Peixian Lv, Fei Tan · 2026-04-30 04:00

Foundation Model-Driven Semantic Change Detection in Remote Sensing Imagery

arXiv:2602.13780v2 Announce Type: replace Abstract: Remote sensing (RS) change detection is essential for interpreting surface dynamics. Semantic change detection (SCD) further enables pixel-level understanding of multi-class transitions, yet remains sensitive to pseudo-changes i…
arXiv cs.CV TIER_1 English(EN) · Xu Tang · 2026-04-29 01:57

Seeking Consensus: Geometric-Semantic On-the-Fly Recalibration for Open-Vocabulary Remote Sensing Semantic Segmentation

Open-vocabulary semantic segmentation (OVSS) in remote sensing images is a promising task that employs textual descriptions for identifying undefined land cover categories. Despite notable advances, existing methods typically employ a static inference paradigm, overlooking the di…
arXiv cs.CV TIER_1 English(EN) · Swadhin Das, Vivek Yadav · 2026-04-28 04:00

JSSFF: A Joint Structural-Semantic Fusion Framework for Remote Sensing Image Captioning

arXiv:2604.24031v1 Announce Type: new Abstract: The encoder-decoder framework has become widely popular nowadays. In this model, the encoder extracts informative visual features from an input image, and the decoder employs a sequence-to-sequence formulation to generate the corres…
arXiv cs.CV TIER_1 English(EN) · Yanpei Gong, Beichen Zhang, Hao Wang, Zhaobo Qi, Xinyan Liu, Yuanrong Xu, Ruiyang Gao, Weigang Zhang · 2026-04-28 04:00

STAND: Semantic Anchoring Constraint with Dual-Granularity Disambiguation for Remote Sensing Image Change Captioning

arXiv:2604.23309v1 Announce Type: new Abstract: Remote sensing image change captioning (RSICC) aims to describe the difference between two remote sensing images. While recent methods have explored video modeling, they largely overlook the inherent ambiguities in viewpoint, scale,…
arXiv cs.CV TIER_1 English(EN) · Ziyun Chen, Fan Liu, Liang Yao, Chuanyi Zhang, Yuye Ma, Wei Zhou · 2026-04-28 04:00

Evaluating Remote Sensing Image Captions Beyond Metric Biases

arXiv:2604.22855v1 Announce Type: new Abstract: The core objective of image captioning is to achieve lossless semantic compression from visual signals into textual modalities. However, the reliance on manually curated reference texts for evaluation essentially forces models to mi…
arXiv cs.CV TIER_1 English(EN) · Vivek Yadav · 2026-04-27 04:34

JSSFF: A Joint Structural-Semantic Fusion Framework for Remote Sensing Image Captioning

The encoder-decoder framework has become widely popular nowadays. In this model, the encoder extracts informative visual features from an input image, and the decoder employs a sequence-to-sequence formulation to generate the corresponding textual description from these features.…

报道来源 [9]

相关实体

相关话题