Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

A Geometric Account of Activation Steering through Angle-Norm Decomposition

Researchers have developed a new geometric framework to understand activation steering in language models. Their work, based on an empirical study across seven models, suggests that concept representation is primarily angular, supporting spherical steering methods. However, the study also highlights the continued importance of hidden-state norm for steering stability and downstream effects, proposing that interventions should be parameterized by both angular and radial components. AI

IMPACT Provides a more nuanced understanding of how to control and interpret language model behavior, potentially leading to more stable and predictable AI systems.

arXiv
A Geometric Account of Activation Steering through Angle-Norm Decomposition