English(EN) Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt?

Gemma 4 12B 在大型提示下难以关注音频

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-10 06:51

用户在使用 Google 的 Gemma 4 12B 统一模型时遇到了问题，该模型旨在同时处理音频、视觉和文本。虽然该模型在短文本提示下能很好地响应音频，但在遇到大型、密集的系统提示时，它似乎会失去关注语音的能力。在多个服务框架中都观察到了这种限制，这表明在处理竞争性输入时，模型架构或注意力机制可能存在问题。 AI

影响凸显了统一多模态模型在处理长上下文时可能存在的局限性，影响了语音助手开发。

排序理由用户报告的特定模型功能问题。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Think_Illustrator188 · 2026-06-10 06:51

Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt?

<div class="md">I'm trying to use Gemma 4 12B — the new encoder-free unified model (audio/vision/text in one) — for a one-pass audio → response voice assistant: feed the recorded WAV + system prompt and get the reply back as tex…

报道来源 [1]

Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt?

相关实体

相关话题