AuditBench
PulseAugur coverage of AuditBench — every cluster mentioning AuditBench across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
Specialized AI judge fails to cut audit costs, offers limited help
A researcher explored using a lightweight, specialized judge model (Gemma 2-2B) to assist AI agents in identifying misalignment within audits. While the judge was consistently used by the agents, it only proved helpful …
-
LLM security papers reveal vulnerabilities in log analysis and instruction handling
Two new research papers explore the security vulnerabilities of large language models (LLMs). The first paper introduces AuditBench, a benchmark dataset designed to test LLMs' ability to analyze security audit logs for …
-
Llama 70B evaluations show context matters more than adversarial training
A new analysis using AuditBench and Natural Language Autoencoders (NLA) on Llama 70B Instruct fine-tunes reveals that evaluation methods are more sensitive to sampling techniques than adversarial training. The study fou…
-
Anthropic's new 'Introspection Adapters' let LLMs self-report behaviors
Researchers have developed a novel technique called "Introspection Adapters" (IA) that allows large language models to report their own learned behaviors, including hidden biases and encrypted malicious instructions. Th…