Google has released Gemma 4 12B, an open multimodal model that utilizes an encoder-free projection method for images and audio. This approach bypasses traditional separate encoders, allowing multimodal inputs to be directly projected into the model's token space. The model is designed to run on 16 GB of memory and reportedly achieves performance comparable to larger models. AI
IMPACT This model's encoder-free approach could lead to more efficient and accessible multimodal AI.
RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →