English(EN) ScoutVLA: UAV-Centric Active Perception via a Dual-Expert VLA Model for Open-World Embodied Question Answering

ScoutVLA模型通过自主感知提升无人机问答能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

研究人员推出ScoutVLA，这是一种新颖的双专家视觉-语言-动作（VLA）模型，专为航空具身问答设计。该模型通过模仿工蜂的‘摇摆舞’，使无人机（UAV）能够主动调整视角以收集细粒度证据，从而克服了现有系统的局限性。ScoutVLA采用解耦架构，包含用于语义意图推理和连续轨迹生成的独立专家，并通过知识隔离机制进行训练，以保持多模态推理能力。现场研究和模拟表明，ScoutVLA的性能显著优于当前最先进的方法，平均严格成功率提高了10.48倍，平均问答正确率提高了7.72倍。 AI

影响引入了一种新的具身AI模型架构，有望在复杂环境中提高机器人感知和任务完成能力。

排序理由该集群描述了一篇关于新AI模型和基准测试的最新研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Wenhao Lu, Zhengqiu Zhu, Xiaofeng Wang, Xiaoran Zhang, Yatai Ji, Yong Zhao, Yue Hu, Yingzhen Nie, Jinlong Zhu, Zheng Zhu · 2026-06-16 04:00

ScoutVLA: UAV-Centric Active Perception via a Dual-Expert VLA Model for Open-World Embodied Question Answering

arXiv:2606.14772v1 Announce Type: cross Abstract: Aerial Embodied Question Answering (EQA) requires Unmanned Aerial Vehicles (UAVs) to actively perceive the environment and answer natural language questions. Existing outdoor EQA systems usually stop once the target enters the UAV…

报道来源 [1]

ScoutVLA: UAV-Centric Active Perception via a Dual-Expert VLA Model for Open-World Embodied Question Answering

相关实体

相关话题