PulseAugur
EN
LIVE 12:03:13

Ternary Mamba achieves 3.61x compression via QAT with knowledge distillation

Researchers have developed a new method for compressing State Space Models (SSMs) like Mamba-2, significantly reducing their memory footprint for edge deployment. By employing grouped quantization-aware training (QAT) with knowledge distillation from a pre-trained FP16 model, they compressed Mamba-2 1.3B to 744 MB, a 3.61x reduction. This approach achieves competitive zero-shot accuracy with a much smaller token budget compared to previous methods, while also identifying a novel instability called "zero-ratio collapse" unique to QAT from pre-trained SSMs. AI

IMPACT Enables more efficient deployment of State Space Models on edge devices by significantly reducing memory footprint.

RANK_REASON The cluster describes a new research paper detailing a novel method for model compression.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Ternary Mamba achieves 3.61x compression via QAT with knowledge distillation

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Ramprasath Ganesaraja, Sahil Dilip Panse, Swathika N ·

    Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

    arXiv:2606.18114v1 Announce Type: cross Abstract: State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint su…

  2. arXiv cs.AI TIER_1 English(EN) · Swathika N ·

    Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

    State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint suffices, reducing the marginal token budget by 1,00…