Kwai-Keye/Keye-VL-2.0-30B-A3B-GGUF · Hugging Face
Kwai-Keye has released Keye-VL-2.0-30B-A3B, a new 30-billion parameter multimodal model designed for advanced video understanding and agent capabilities. The model excels in temporal localization, matching or surpassing Gemini-3-Flash on video benchmarks, and supports hour-long video contexts through its DSA-Native Long-Context Architecture. Keye-VL-2.0-30B-A3B also features a high-efficiency inference and training stack, robust post-training for reliable reasoning, and built-in agent abilities for tasks like code execution and tool use. AI
IMPACT Sets new SOTA on video understanding benchmarks for its scale, potentially influencing future multimodal agent development.