PulseAugur
EN
LIVE 05:13:48

Google DeepMind releases encoder-free multimodal model Gemma 4 12B

Google DeepMind has released Gemma 4 12B, an open-weights multimodal model designed for efficient local deployment. This 11.95 billion parameter model uniquely processes text, images, audio, and video through a unified pathway, eliminating the need for separate vision and audio encoders. This architecture allows it to run on devices with as little as 16GB of memory, making it suitable for a variety of offline applications like transcription, summarization, and local coding assistants. AI

IMPACT Enables more capable local multimodal applications by reducing computational overhead.

RANK_REASON New model release from a frontier lab with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Google DeepMind releases encoder-free multimodal model Gemma 4 12B

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · Hassann ·

    What is Gemma 4 12B?

    <p>Google shipped Gemma 4 12B on June 3, 2026. It is an open-weights, 11.95B-parameter model that accepts text, images, audio, and video as input, returns text, and can run on a laptop with 16GB of memory. The main implementation detail: it is a mid-sized multimodal model with na…