steering vectors
PulseAugur coverage of steering vectors — every cluster mentioning steering vectors across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
AI alignment research tackles reward hacking with new techniques
Researchers are exploring methods to prevent AI models from exploiting reward functions, a phenomenon known as reward hacking. One approach involves using steering vectors to guide gradient routing, aiming to isolate un…
-
Soft prompt distillation enhances on-device LLM safety
Researchers have developed a new method for making large language models safer and more efficient for use on devices with limited resources. The technique involves using "soft prompts" combined with distillation to tran…
-
Steering vectors offer direct control over LLM tone, bypassing prompt limitations
Prompt engineering is often ineffective for controlling the tone of large language models because behavioral traits are encoded in the model's internal state, not just its input prompts. A technique called activation st…