Researchers have developed a new method for compressing State Space Models (SSMs) like Mamba-2, significantly reducing their memory footprint for edge deployment. By employing grouped quantization-aware training (QAT) with knowledge distillation from a pre-trained FP16 model, they compressed Mamba-2 1.3B to 744 MB, a 3.61x reduction. This approach achieves competitive zero-shot accuracy with a much smaller token budget compared to previous methods, while also identifying a novel instability called "zero-ratio collapse" unique to QAT from pre-trained SSMs. AI
IMPACT Enables more efficient deployment of State Space Models on edge devices by significantly reducing memory footprint.
RANK_REASON The cluster describes a new research paper detailing a novel method for model compression.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →