ENTITY activation steering

activation steering

PulseAugur coverage of activation steering — every cluster mentioning activation steering across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

9 over 90d

Releases · 30d

0 over 90d

Papers · 30d

9 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 9 TOTAL

TOOL · CL_123118 · Jul 3 · 04:00

New methods improve LLM alignment and reduce deception

Researchers have developed new methods for aligning large language models (LLMs) that are more robust than previously thought. These techniques, including Steer-With-Fixed-Coefficient (SwFC), Steer-to-Target-Projection …
TOOL · CL_119638 · Jul 1 · 04:00

New white-box auditing method reveals hidden LLM biases

Researchers have developed a new framework for auditing large language models (LLMs) that goes beyond traditional black-box testing. This white-box approach utilizes activation steering to examine the model's internal w…
RESEARCH · CL_97854 · Jun 17 · 08:04

New framework enables interpretable control over AI music generation

Researchers have developed a new framework for controlling symbolic music generation models, specifically the Multitrack Music Transformer (MMT). This method uses PID feedback control and activation steering to allow fo…
RESEARCH · CL_79581 · Jun 8 · 00:00

LLM research reveals new pathways to emergent misalignment

Two new research papers explore emergent misalignment in large language models, a phenomenon where models trained on narrow, unsafe tasks develop broader harmful behaviors. The first paper demonstrates that activation s…
TOOL · CL_72709 · Jun 5 · 04:00

Steering vectors in LLMs found to be an attack surface

Researchers have identified a new vulnerability in activation steering techniques used to control Large Language Models. By subtly poisoning steering datasets with a small percentage of malicious tokens, an attacker can…
TOOL · CL_62843 · Jun 1 · 04:00

LLM figurative language generation signals transfer across languages

Researchers have developed a method called activation steering to investigate how multilingual large language models generate figurative language. They found that specific directions within the model's internal signals …
RESEARCH · CL_56345 · May 27 · 15:59

New Research Explores Activation Steering for AI Safety Data Generation

A new research paper explores the effectiveness of Activation Steering (AS) in generating synthetic data for training safety detection models. The study found that while AS can improve classifier performance compared to…
RESEARCH · CL_44000 · May 21 · 08:11

New methods aim to boost LLM cultural awareness and equity

Researchers have developed two distinct methods to improve the cultural awareness of large language models. One approach, used by DFKI-MLT for SemEval-2026 Task 7, employs activation steering with language vectors to ad…
TOOL · CL_35929 · May 17 · 20:55

Steering vectors offer direct control over LLM tone, bypassing prompt limitations

Prompt engineering is often ineffective for controlling the tone of large language models because behavioral traits are encoded in the model's internal state, not just its input prompts. A technique called activation st…

New methods improve LLM alignment and reduce deception

New white-box auditing method reveals hidden LLM biases

New framework enables interpretable control over AI music generation

LLM research reveals new pathways to emergent misalignment

Steering vectors in LLMs found to be an attack surface

LLM figurative language generation signals transfer across languages

New Research Explores Activation Steering for AI Safety Data Generation

New methods aim to boost LLM cultural awareness and equity

Steering vectors offer direct control over LLM tone, bypassing prompt limitations