PulseAugur
EN
LIVE 12:47:02

Kwai releases Keye-VL-2.0 for long-video understanding

Kwai has released Keye-VL-2.0-30B-A3B, an open-source multimodal foundation model designed for long-video understanding and agentic intelligence. This model utilizes DeepSeek Sparse Attention to process up to 256K context, capturing essential frames and temporal dependencies in hour-long videos. It also incorporates Cross-Modal Multi-Teacher On-Policy Distillation to enhance multi-task alignment and agent collaboration across various scenarios. Evaluations show state-of-the-art performance on video understanding and temporal localization benchmarks. AI

IMPACT Enables advanced agent collaboration and improved long-video comprehension, potentially accelerating development in multimodal AI applications.

RANK_REASON The cluster contains a technical report detailing a new open-source multimodal foundation model released on arXiv.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Kwai Keye-VL-2.0 Technical Report

    Kwai Keye-VL-2.0-30B-A3B is an open-source Mixture-of-Experts multimodal foundation model that enables long-video understanding and agentic intelligence through DeepSeek Sparse Attention and specialized training infrastructure.

  2. arXiv cs.CV TIER_1 English(EN) · Kwai Keye Team, Bin Wen, Changyi Liu, Chengru Song, Chongling Rao, Guowang Zhang, Han Li, Haonan Fan, Hengrui Ju, Jiankang Chen, Jiapeng Chen, Jiawei Yuan, Kaixuan Yang, Kaiyu Jiang, Kun Gai, Lingzhi Zhou, Na Nie, Sen Na, Tianke Zhang, Tingting Gao, Xuan… ·

    Kwai Keye-VL-2.0 Technical Report

    arXiv:2606.10651v1 Announce Type: new Abstract: We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, info…

  3. arXiv cs.CV TIER_1 English(EN) · Ruilin Zhang ·

    Kwai Keye-VL-2.0 Technical Report

    We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, information redundancy, and prohibitive computationa…