PulseAugur
LIVE 08:02:36
ENTITY Logit-Gap Steering

Logit-Gap Steering

PulseAugur coverage of Logit-Gap Steering — every cluster mentioning Logit-Gap Steering across labs, papers, and developer communities, ranked by signal.

Total · 30d
1
1 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
1
1 over 90d
TIER MIX · 90D
RECENT · PAGE 1/1 · 1 TOTAL
  1. TOOL · CL_15984 ·

    New Logit-Gap Steering method efficiently measures AI alignment robustness

    Researchers have developed a new metric called the refusal-affirmation logit gap to quantify the safety margin of aligned language models. This metric, which measures the difference between refusal and affirmation token…