CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search
Researchers have developed CRePE, a new method for post-training pruning of large language models that improves efficiency by incorporating 2D local neighborhood context and adaptive coefficients. This approach outperforms existing pruning techniques across various models and sparsity levels. To accelerate the optimization process, they also introduced PHO, a proxy-based hyperparameter optimization method that significantly reduces search time from hours to minutes and demonstrates strong generalization across different models. AI
IMPACT Reduces computational costs for LLM deployment, potentially accelerating adoption and enabling more efficient model usage.