When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search
Researchers have developed a new framework called GRACE to improve the efficiency of activation steering in large language models. This method addresses the challenge of finding effective steering directions by using geometric properties of model activations to guide the search process. The framework aims to reduce the computational cost of controlling LLMs without retraining, making concept manipulation more accessible. AI
IMPACT Reduces the computational cost of controlling LLMs, potentially enabling more widespread use of activation steering techniques.