English(EN) Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions

新方法在不牺牲性能或推理能力的情况下增强了大型语言模型的控制能力

作者 PulseAugur 编辑部 · [5 个来源] · 2026-05-07 09:03

研究人员开发了新的方法，可以在推理时引导大型语言模型（LLM）的行为，而不会牺牲生成质量。一种方法是仅提示词转向向量（Prompt-only SV, PrOSV），它仅干预提示词标记，在AxBench等基准测试中表现优于传统的全序列转向向量。另一种方法是FLAS（基于流的激活转向），它学习一个条件概念的速度场来传输激活，在Gemma模型上始终优于提示词方法。第三种技术SKOP（通过关键正交投影进行转向）将注意力重新路由限制在保留推理和检索性能上，在效用和转向效果之间取得了更好的权衡。 AI

影响推理时LLM控制的新技术可以通过提高转向准确性和减少性能下降，从而实现更细致、更可靠的AI应用。

排序理由三篇新的arXiv论文介绍了在推理时控制LLM行为的新颖方法。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

arXiv cs.LG TIER_1 English(EN) · Yuntai Bao, Qinfeng Li, Xinyan Yu, Xuhong Zhang, Ge Su, Wenqi Zhang, Liu Yan, Haiqin Weng, Jianwei Yin · 2026-05-08 04:00

迈向无损引导：基于原则的引导向量训练用于提示词干预

arXiv:2605.05983v1 Announce Type: new Abstract: Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current a…
arXiv cs.LG TIER_1 English(EN) · Zehao Jin, Ruixuan Deng, Junran Wang, Xinjie Shen, Chao Zhang · 2026-05-08 04:00

超越转向向量：基于流的激活转向用于推理时干预

arXiv:2605.05892v1 Announce Type: cross Abstract: Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations…
arXiv cs.CL TIER_1 English(EN) · Haoyan Luo, Mateo Espinosa Zarlenga, Mateja Jamnik · 2026-05-08 04:00

切勿分心：通过键正交投影进行激活定向

arXiv:2605.06342v1 Announce Type: new Abstract: Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention …
arXiv cs.CL TIER_1 English(EN) · Mateja Jamnik · 2026-05-07 14:29

切勿分心：通过键正交投影实现激活定向

Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention rerouting: steering vectors alter query-key matc…
arXiv cs.CL TIER_1 English(EN) · Chao Zhang · 2026-05-07 09:03

超越转向向量：基于流的激活转向用于推理时干预

Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that existing steering metho…

报道来源 [5]

迈向无损引导：基于原则的引导向量训练用于提示词干预

超越转向向量：基于流的激活转向用于推理时干预

切勿分心：通过键正交投影进行激活定向

切勿分心：通过键正交投影实现激活定向

超越转向向量：基于流的激活转向用于推理时干预

相关实体

相关话题