ENTITY Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

PulseAugur coverage of Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training — every cluster mentioning Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

1 over 90d

Releases · 30d

0 over 90d

Papers · 30d

0 over 90d

TIER MIX · 90D

TOPICS

safety 1
policy 1
product 1

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 1 TOTAL

COMMENTARY · CL_124324 · Jul 3 · 16:59

Hidden LLM Backdoors Pose Massive Security Risk, Experts Warn

Researchers and investors are increasingly concerned about hidden backdoors in large language models that could be triggered remotely to exfiltrate sensitive data. Anthropic researchers demonstrated in a January 2024 pa…

Hidden LLM Backdoors Pose Massive Security Risk, Experts Warn