Google unveils Gemma 4 12B with encoder-free multimodal projection

By PulseAugur Editorial · [1 sources] · 2026-06-16 11:17

Google has released Gemma 4 12B, an open multimodal model that utilizes an encoder-free projection method for images and audio. This approach bypasses traditional separate encoders, allowing multimodal inputs to be directly projected into the model's token space. The model is designed to run on 16 GB of memory and reportedly achieves performance comparable to larger models. AI

IMPACT This model's encoder-free approach could lead to more efficient and accessible multimodal AI.

RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · pueding · 2026-06-16 11:17

Google Releases Gemma 4 12B: Encoder-Free Multimodal Projection

 What: Google released Gemma 4 12B, an open multimodal model whose headline trick is encoder-free multimodal projection — it turns images and audio into tokens by projecting them straight into the token space, instead …

COVERAGE [1]

Google Releases Gemma 4 12B: Encoder-Free Multimodal Projection

RELATED ENTITIES

RELATED TOPICS