Ternary Mamba achieves 3.61x compression via QAT with knowledge distillation

By PulseAugur Editorial · [2 sources] · 2026-06-16 16:18

Researchers have developed a new method for compressing State Space Models (SSMs) like Mamba-2, significantly reducing their memory footprint for edge deployment. By employing grouped quantization-aware training (QAT) with knowledge distillation from a pre-trained FP16 model, they compressed Mamba-2 1.3B to 744 MB, a 3.61x reduction. This approach achieves competitive zero-shot accuracy with a much smaller token budget compared to previous methods, while also identifying a novel instability called "zero-ratio collapse" unique to QAT from pre-trained SSMs. AI

IMPACT Enables more efficient deployment of State Space Models on edge devices by significantly reducing memory footprint.

RANK_REASON The cluster describes a new research paper detailing a novel method for model compression.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Ternary Mamba achieves 3.61x compression via QAT with knowledge distillation

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Ramprasath Ganesaraja, Sahil Dilip Panse, Swathika N · 2026-06-17 04:00

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

arXiv:2606.18114v1 Announce Type: cross Abstract: State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint su…
arXiv cs.AI TIER_1 English(EN) · Swathika N · 2026-06-16 16:18

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint suffices, reducing the marginal token budget by 1,00…

COVERAGE [2]

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

RELATED ENTITIES

RELATED TOPICS