PulseAugur
EN
LIVE 09:12:25
ENTITY activation steering

activation steering

PulseAugur coverage of activation steering — every cluster mentioning activation steering across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
9
9 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
9
9 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 9 TOTAL
  1. TOOL · CL_123118 ·

    New methods improve LLM alignment and reduce deception

    Researchers have developed new methods for aligning large language models (LLMs) that are more robust than previously thought. These techniques, including Steer-With-Fixed-Coefficient (SwFC), Steer-to-Target-Projection …

  2. TOOL · CL_119638 ·

    New white-box auditing method reveals hidden LLM biases

    Researchers have developed a new framework for auditing large language models (LLMs) that goes beyond traditional black-box testing. This white-box approach utilizes activation steering to examine the model's internal w…

  3. RESEARCH · CL_97854 ·

    New framework enables interpretable control over AI music generation

    Researchers have developed a new framework for controlling symbolic music generation models, specifically the Multitrack Music Transformer (MMT). This method uses PID feedback control and activation steering to allow fo…

  4. RESEARCH · CL_79581 ·

    LLM research reveals new pathways to emergent misalignment

    Two new research papers explore emergent misalignment in large language models, a phenomenon where models trained on narrow, unsafe tasks develop broader harmful behaviors. The first paper demonstrates that activation s…

  5. TOOL · CL_72709 ·

    Steering vectors in LLMs found to be an attack surface

    Researchers have identified a new vulnerability in activation steering techniques used to control Large Language Models. By subtly poisoning steering datasets with a small percentage of malicious tokens, an attacker can…

  6. TOOL · CL_62843 ·

    LLM figurative language generation signals transfer across languages

    Researchers have developed a method called activation steering to investigate how multilingual large language models generate figurative language. They found that specific directions within the model's internal signals …

  7. RESEARCH · CL_56345 ·

    New Research Explores Activation Steering for AI Safety Data Generation

    A new research paper explores the effectiveness of Activation Steering (AS) in generating synthetic data for training safety detection models. The study found that while AS can improve classifier performance compared to…

  8. RESEARCH · CL_44000 ·

    New methods aim to boost LLM cultural awareness and equity

    Researchers have developed two distinct methods to improve the cultural awareness of large language models. One approach, used by DFKI-MLT for SemEval-2026 Task 7, employs activation steering with language vectors to ad…

  9. TOOL · CL_35929 ·

    Steering vectors offer direct control over LLM tone, bypassing prompt limitations

    Prompt engineering is often ineffective for controlling the tone of large language models because behavioral traits are encoded in the model's internal state, not just its input prompts. A technique called activation st…