Brief · PulseAugur

SIGNIFICANT · Mastodon — mastodon.social 日本語(JA) · 13h

High-speed and high-accuracy vision-language model "Zamba2-VL" appears, developed with an architecture faster than Transformer https:// fed.brid.gy/r/https://gigazine .net/news/20260611-zamba2-vl-zyphra/

AI development company Zyphra has released Zamba2-VL, a new vision-language model built on a hybrid SSM-Transformer architecture. This architecture combines elements of standard Transformers with Mamba2, enabling faster image recognition processing compared to similarly sized Transformer-based models while maintaining comparable quality. Zyphra has made three versions of Zamba2-VL available as open models under the Apache License 2.0. AI

IMPACT Offers a faster alternative for vision-language tasks, potentially improving efficiency in multimodal AI applications.

Transformer
Mamba2
Zyphra
Zamba2-VL
SSM-Transformer