ENTITY steering vectors

steering vectors

PulseAugur coverage of steering vectors — every cluster mentioning steering vectors across labs, papers, and developer communities, ranked by signal.

Total · 30d

3

3 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

3

3 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 3 TOTAL

RESEARCH · CL_112642 · Jun 26 · 15:34

AI alignment research tackles reward hacking with new techniques

Researchers are exploring methods to prevent AI models from exploiting reward functions, a phenomenon known as reward hacking. One approach involves using steering vectors to guide gradient routing, aiming to isolate un…
RESEARCH · CL_79607 · Jun 8 · 12:03

Soft prompt distillation enhances on-device LLM safety

Researchers have developed a new method for making large language models safer and more efficient for use on devices with limited resources. The technique involves using "soft prompts" combined with distillation to tran…
TOOL · CL_35929 · May 17 · 20:55

Steering vectors offer direct control over LLM tone, bypassing prompt limitations

Prompt engineering is often ineffective for controlling the tone of large language models because behavioral traits are encoded in the model's internal state, not just its input prompts. A technique called activation st…