Researchers have developed a new method for compressing State Space Models (SSMs) like Mamba-2, reducing their memory footprint for edge deployment. By using grouped quantization-aware training (QAT) with knowledge distillation from a pre-trained FP16 model, they compressed Mamba-2 1.3B to 744 MB, achieving 48.1% zero-shot accuracy with significantly less training data and compute than previous methods. This approach revealed a novel instability called zero-ratio collapse, which differs from issues seen in Transformer models and requires specific correction strategies for SSMs. AI
IMPACT Enables more efficient deployment of advanced State Space Models on resource-constrained edge devices.
RANK_REASON The cluster contains an academic paper detailing a new method for model compression. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →