PulseAugur
实时 14:15:58
English(EN) Kwai Keye-VL-2.0 Technical Report

Kwai发布Keye-VL-2.0用于长视频理解

Kwai发布了Keye-VL-2.0-30B-A3B,这是一个开源的多模态基础模型,专为长视频理解和智能体智能而设计。该模型利用DeepSeek稀疏注意力处理高达256K的上下文,捕捉长达一小时视频中的关键帧和时间依赖性。它还结合了跨模态多教师策略内蒸馏,以增强各种场景下的多任务对齐和智能体协作。评估显示在视频理解和时间定位基准测试中取得了最先进的性能。 AI

影响 实现了先进的智能体协作和改进的长视频理解能力,可能加速多模态AI应用的发展。

排序理由 该集群包含一份技术报告,详细介绍了在arXiv上发布的新开源多模态基础模型。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Kwai Keye-VL-2.0 Technical Report

    Kwai Keye-VL-2.0-30B-A3B is an open-source Mixture-of-Experts multimodal foundation model that enables long-video understanding and agentic intelligence through DeepSeek Sparse Attention and specialized training infrastructure.

  2. arXiv cs.CV TIER_1 English(EN) · Kwai Keye Team, Bin Wen, Changyi Liu, Chengru Song, Chongling Rao, Guowang Zhang, Han Li, Haonan Fan, Hengrui Ju, Jiankang Chen, Jiapeng Chen, Jiawei Yuan, Kaixuan Yang, Kaiyu Jiang, Kun Gai, Lingzhi Zhou, Na Nie, Sen Na, Tianke Zhang, Tingting Gao, Xuan… ·

    Kwai Keye-VL-2.0 Technical Report

    arXiv:2606.10651v1 Announce Type: new Abstract: We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, info…

  3. arXiv cs.CV TIER_1 English(EN) · Ruilin Zhang ·

    Kwai Keye-VL-2.0 Technical Report

    We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, information redundancy, and prohibitive computationa…