PulseAugur
EN
LIVE 21:30:01

Reddit users discuss 2-bit QAT for large MoE models

A discussion on Reddit explores the potential of 2-bit Quantization Aware Training (QAT) for large Mixture-of-Experts (MoE) models. The user speculates that such models, ranging from 120 billion to 400 billion parameters, could be feasible for consumer hardware with 64-128 GB of RAM. While acknowledging that 2-bit QAT wouldn't match 8-bit or 16-bit performance, it might offer a better alternative to training ternary LLMs from scratch and could still be viable for tasks like creative writing. AI

RANK_REASON This is a user discussion on a forum about a potential technical approach, not a release or research paper.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 (CA) · /u/silenceimpaired ·

    2-bit QAT model releases

    <!-- SC_OFF --><div class="md"><p>So far model releases that take advantage of Quantization a<br /> Aware Training (QAT) have been focused on 4-bit. </p> <p>I’m curious what could be accomplished with a larger MoE model around 120b up to 400b. Obviously the model could not approa…