Google DeepMind releases encoder-free multimodal Gemma 4 12B

By PulseAugur Editorial · [1 sources] · 2026-06-03 18:46

Google DeepMind has released Gemma 4 12B, a new 12-billion-parameter multimodal model that integrates text, image, audio, and video processing without separate encoders. This novel architecture allows the model to run complex agentic workflows on consumer hardware with as little as 16 GB of RAM. The model is available under the Apache 2.0 license, with weights downloadable from Hugging Face and Kaggle, and supports various inference stacks for local deployment. AI

IMPACT Enables advanced multimodal AI capabilities on consumer hardware, potentially accelerating local agent development and deployment.

RANK_REASON New model release from a frontier lab with system card details. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on MarkTechPost →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Google DeepMind releases encoder-free multimodal Gemma 4 12B

COVERAGE [1]

MarkTechPost TIER_1 English(EN) · Asif Razzaq · 2026-06-03 18:46

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop

<p>Gemma 4 12B feeds vision and audio straight into the LLM backbone, running locally under an Apache 2.0 license.</p> <p>The post <a href="https://www.marktechpost.com/2026/06/03/google-deepmind-releases-gemma-4-12b-an-encoder-free-multimodal-model-with-native-audio-that-runs-on…

COVERAGE [1]

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop

RELATED ENTITIES

RELATED TOPICS