实体
AuditBench
AuditBench
PulseAugur coverage of AuditBench — every cluster mentioning AuditBench across labs, papers, and developer communities, ranked by signal.
总计 · 30天
2
90 天内 2
发布 · 30天
0
90 天内 0
论文 · 30天
2
90 天内 2
层级分布 · 90 天
情绪 · 30 天
1 天有情绪数据
最近 · 第 1/1 页 · 共 2 条
-
Llama 70B evaluations show context matters more than adversarial training
A new analysis using AuditBench and Natural Language Autoencoders (NLA) on Llama 70B Instruct fine-tunes reveals that evaluation methods are more sensitive to sampling techniques than adversarial training. The study fou…
-
Anthropic's new 'Introspection Adapters' let LLMs self-report behaviors
Researchers have developed a novel technique called "Introspection Adapters" (IA) that allows large language models to report their own learned behaviors, including hidden biases and encrypted malicious instructions. Th…