User struggles with Gemma 4 31B output quality on vLLM

By PulseAugur Editorial · [1 sources] · 2026-05-27 22:21

A user is experiencing issues running Google's Gemma 4 31B model locally using vLLM on A100 GPUs, resulting in poor quality and malformed JSON output. The same model, when accessed via Google's API, produces correct structured output. The user suspects the problem lies in the vLLM configuration, as all other parameters and the model's precision (BF16) remain consistent. AI

IMPACT Troubleshooting a specific model deployment issue may help other users facing similar configuration challenges.

RANK_REASON User is reporting an issue with a specific tool (vLLM) when running a model, not a release or major industry event.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Thagor · 2026-05-27 22:21

Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong)

<div class="md"><p>Okay fun time I got access to two Nvlinked A100s for some research project I benchmarked my work against the Gemma 4 31b-it available through Google, but my dataset is rather massive, so I need to run it on the "local" resources. Basica…

COVERAGE [1]

Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong)

RELATED ENTITIES

RELATED TOPICS