A discussion on Reddit explores the potential of 2-bit Quantization Aware Training (QAT) for large Mixture-of-Experts (MoE) models. The user speculates that such models, ranging from 120 billion to 400 billion parameters, could be feasible for consumer hardware with 64-128 GB of RAM. While acknowledging that 2-bit QAT wouldn't match 8-bit or 16-bit performance, it might offer a better alternative to training ternary LLMs from scratch and could still be viable for tasks like creative writing. AI
RANK_REASON This is a user discussion on a forum about a potential technical approach, not a release or research paper.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →