Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models
Researchers have developed a new training-free method called Causal Attribution Pruning (CAP) to reduce the size of large language models while preserving their reasoning capabilities. CAP identifies and prunes less critical attention heads by measuring their causal impact on reasoning tasks. This approach demonstrated significant improvements over existing methods like Wanda, particularly on benchmarks such as ARC-Challenge, and showed promise for models like Llama-3 and Mistral-7B-Instruct at moderate sparsity levels. AI
IMPACT This method could lead to more efficient LLMs, reducing inference costs and making advanced reasoning capabilities more accessible.