PulseAugur
LIVE 10:46:20
research · [5 sources] ·
0
research

New methods enhance LLM control without sacrificing performance or reasoning

Researchers have developed new methods for steering large language model (LLM) behaviors at inference time without sacrificing generation quality. One approach, Prompt-only SV (PrOSV), intervenes only on prompt tokens, outperforming traditional full-sequence steering vectors on benchmarks like AxBench. Another method, FLAS (Flow-based Activation Steering), learns a concept-conditioned velocity field to transport activations, consistently outperforming prompting on Gemma models. A third technique, SKOP (Steering via Key-Orthogonal Projections), constrains attention rerouting to preserve reasoning and retrieval performance, achieving a better trade-off between utility and steering efficacy. AI

Summary written by gemini-2.5-flash-lite from 5 sources. How we write summaries →

IMPACT New techniques for inference-time LLM control could enable more nuanced and reliable AI applications by improving steering accuracy and reducing performance degradation.

RANK_REASON Three new arXiv papers introduce novel methods for controlling LLM behavior at inference time.

Read on arXiv cs.LG →

COVERAGE [5]

  1. arXiv cs.LG TIER_1 · Yuntai Bao, Qinfeng Li, Xinyan Yu, Xuhong Zhang, Ge Su, Wenqi Zhang, Liu Yan, Haiqin Weng, Jianwei Yin ·

    Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions

    arXiv:2605.05983v1 Announce Type: new Abstract: Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current a…

  2. arXiv cs.LG TIER_1 · Zehao Jin, Ruixuan Deng, Junran Wang, Xinjie Shen, Chao Zhang ·

    Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

    arXiv:2605.05892v1 Announce Type: cross Abstract: Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations…

  3. arXiv cs.CL TIER_1 · Haoyan Luo, Mateo Espinosa Zarlenga, Mateja Jamnik ·

    Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

    arXiv:2605.06342v1 Announce Type: new Abstract: Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention …

  4. arXiv cs.CL TIER_1 · Mateja Jamnik ·

    Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

    Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention rerouting: steering vectors alter query-key matc…

  5. arXiv cs.CL TIER_1 · Chao Zhang ·

    Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

    Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that existing steering metho…