Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

Researchers have developed SAGE (Spectral Activation-GEometry Sanitization), a novel post-hoc method to improve the retention capabilities of large language models (LLMs) after unlearning processes. Current unlearning techniques often sacrifice model performance on retained data. SAGE addresses this by analyzing the activation geometry of retained data to correct the unlearning update vector without needing to rerun the entire unlearning pipeline. This approach consistently mitigates the trade-off between forgetting and retention across various unlearning methods and model scales. AI

IMPACT Enhances LLM unlearning techniques, potentially leading to more robust and performant models after data removal.

Hugging Face
arXiv
large language model
SAGE
DagsHub
alphaXiv
IArxiv
cs.AI
Spectral Activation-GEometry Sanitization