Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 11h

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Researchers have developed a new training-free method called Causal Attribution Pruning (CAP) to reduce the size of large language models while preserving their reasoning capabilities. CAP identifies and prunes less critical attention heads by measuring their causal impact on reasoning tasks. This approach demonstrated significant improvements over existing methods like Wanda, particularly on benchmarks such as ARC-Challenge, and showed promise for models like Llama-3 and Mistral-7B-Instruct at moderate sparsity levels. AI

IMPACT This method could lead to more efficient LLMs, reducing inference costs and making advanced reasoning capabilities more accessible.

Llama 3
GSM8K
StrategyQA
multilayer perceptron
LLaMA-3-8B-Instruct
ARC challenge
Mistral-7B-Instruct
Wanda
Causal Attribution Pruning