English(EN) Has anyone experimented with stabilizing low quant models with lower temp and top p?

LocalLLaMA 用户讨论量化 LLM 的稳定性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-30 19:31

r/LocalLLaMA 子版块的一名用户正在寻求关于稳定大型、高度量化语言模型的建议。他们计划尝试降低 temperature 和 top-p 采样参数，以减轻这些模型（尤其是在 VRAM 有限的情况下运行时）产生的异常输出。 AI

影响提供了关于优化本地 LLM 性能和稳定性的实用技术见解。

排序理由用户对技术主题的讨论，而非正式发布或公告。

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/fragment_me · 2026-05-30 19:31

Has anyone experimented with stabilizing low quant models with lower temp and top p?

<div class="md"><p>I was thinking about trying some bigger models out on my 80GB VRAM setup, but everything MoE is too slow with CPU offload. Otherwise there aren't many models that are purpose built for 80GB VRAM. Most of the bigger models require using a heavily …