Users discuss Q1/Q2 quantization of large language models on r/LocalLLaMA

By PulseAugur Editorial · [1 sources] · 2026-06-28 11:14

A discussion on the r/LocalLLaMA subreddit explores the usability of heavily quantized large language models, specifically those in the Q1 or Q2 range for models between 100-250 billion parameters. Users are sharing their experiences with these lower-quantization models for tasks like agentic coding, writing, and chatting, and reporting any issues encountered such as looping or repetition. The thread also lists several recent large models, including DeepSeek-V4-Flash, Qwen3-235B-A22B, and NVIDIA-Nemotron-3-Super-120B-A12B, to provide context for the discussion. AI

IMPACT Provides insights into the practical performance and limitations of running large language models with aggressive quantization on consumer hardware.

RANK_REASON Discussion on a subreddit about the practical use of quantized models.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Users discuss Q1/Q2 quantization of large language models on r/LocalLLaMA

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/pmttyji · 2026-06-28 11:14

How many of you do use Q1 or Q2 of Big models(100-250B)? How's it?

<div class="md">Sharing popular(also recent) models for reference: 151-250B : <ul> <li>DeepSeek-V4-Flash</li> <li>Step-3.X-Flash</li> <li>Command-a-plus-05-2026</li> <li>Laguna-M.1</li> <li>MiniMax-M2.X</li> <li>Qwen3-235B-A22B</li> <…

COVERAGE [1]

How many of you do use Q1 or Q2 of Big models(100-250B)? How's it?

RELATED ENTITIES

RELATED TOPICS