Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 5d

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

Researchers have developed a new method for adapting pruned Large Language Models (LLMs) called local reconstruction. This technique involves adapting subsets of model parameters one at a time to match the original dense model's activations, proving effective even for models up to 72 billion parameters. Local reconstruction achieves performance comparable to full retraining but requires significantly less data and compute, and its effectiveness is largely independent of the specific window size as long as it includes a nonlinear submodule. The study also found that this adaptation method reduces the importance of the pruning criterion itself, making simpler pruning techniques more competitive at larger model scales. AI

IMPACT This research offers a more efficient way to adapt pruned LLMs, potentially lowering inference costs and making simpler pruning methods viable for large-scale models.