Kwai-Keye releases Keye-VL-2.0-30B-A3B for long-video understanding

By PulseAugur Editorial · [1 sources] · 2026-06-18 09:05

Kwai-Keye has released Keye-VL-2.0-30B-A3B, a new 30-billion parameter multimodal model designed for advanced video understanding and agent capabilities. The model excels in temporal localization, matching or surpassing Gemini-3-Flash on video benchmarks, and supports hour-long video contexts through its DSA-Native Long-Context Architecture. Keye-VL-2.0-30B-A3B also features a high-efficiency inference and training stack, robust post-training for reliable reasoning, and built-in agent abilities for tasks like code execution and tool use. AI

IMPACT Sets new SOTA on video understanding benchmarks for its scale, potentially influencing future multimodal agent development.

RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Kwai-Keye releases Keye-VL-2.0-30B-A3B for long-video understanding

COVERAGE [1]

r/LocalLLaMA TIER_1 (SW) · /u/jacek2023 · 2026-06-18 09:05

Kwai-Keye/Keye-VL-2.0-30B-A3B-GGUF · Hugging Face

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1u9176j/kwaikeyekeyevl2030ba3bgguf_hugging_face/"> <img alt="Kwai-Keye/Keye-VL-2.0-30B-A3B-GGUF · Hugging Face" src="https://external-preview.redd.it/S06mGO1g_9jOLqroAhmjWtxAAVuVcEfVYqLsQwRaljU.png?width=640&a…

COVERAGE [1]

Kwai-Keye/Keye-VL-2.0-30B-A3B-GGUF · Hugging Face

RELATED ENTITIES

RELATED TOPICS