ENTITY HHH (Helpful, Harmless, Honest)-violating outputs

HHH (Helpful, Harmless, Honest)-violating outputs

PulseAugur coverage of HHH (Helpful, Harmless, Honest)-violating outputs — every cluster mentioning HHH (Helpful, Harmless, Honest)-violating outputs across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

1 over 90d

Releases · 30d

0 over 90d

Papers · 30d

1 over 90d

TIER MIX · 90D

TOPICS

safety 1
paper 1

RECENT · PAGE 1/1 · 1 TOTAL

RESEARCH · CL_56345 · May 27 · 15:59

New Research Explores Activation Steering for AI Safety Data Generation

A new research paper explores the effectiveness of Activation Steering (AS) in generating synthetic data for training safety detection models. The study found that while AS can improve classifier performance compared to…

New Research Explores Activation Steering for AI Safety Data Generation