LLM activation steering improved with model-based linear optimal control

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a new method for controlling Large Language Model (LLM) behavior at inference time by treating their layer-wise dynamics as locally-linear systems. This approach adapts classical linear optimal control techniques to steer model activations towards desired semantic targets. The method offers closed-loop control with minimal computational overhead and provides theoretical guarantees on performance, outperforming existing activation steering techniques in controlling attributes like toxicity and truthfulness. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON Academic paper detailing a novel method for controlling LLM behavior.

Read on arXiv stat.ML →

COVERAGE [1]

arXiv stat.ML TIER_1 · Glen Chou · 2026-04-21 03:09

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through…

COVERAGE [1]

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

RELATED TOPICS