English(EN) Google Releases Gemma 4 12B: Encoder-Free Multimodal Projection

Google 发布 Gemma 4 12B，采用无编码器多模态投影

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 11:17

Google 发布了 Gemma 4 12B，这是一个开放的多模态模型，它使用一种无编码器的投影方法来处理图像和音频。这种方法绕过了传统的独立编码器，允许多模态输入直接投影到模型的 token 空间。该模型设计运行在 16 GB 内存上，据报道其性能可与更大的模型相媲美。 AI

影响该模型无编码器的方法可能带来更高效、更易于访问的多模态 AI。

排序理由 Frontier-lab 模型发布，附带系统卡。[lever_c_demoted from frontier_release: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · pueding · 2026-06-16 11:17

Google Releases Gemma 4 12B: Encoder-Free Multimodal Projection

 What: Google released Gemma 4 12B, an open multimodal model whose headline trick is encoder-free multimodal projection — it turns images and audio into tokens by projecting them straight into the token space, instead …