AdvBench
PulseAugur coverage of AdvBench — every cluster mentioning AdvBench across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
Researchers automate security rule generation from attack simulations
Researchers have developed a method to automatically generate security detection rules from attack simulations. This system deterministically maps findings from Breach-and-Attack-Simulation (BAS) tools to starter Sigma …
-
Hybrid defense framework boosts LLM accuracy and robustness
Researchers have developed a novel hybrid defense framework to combat both hallucinations and adversarial manipulation in large language models. This approach integrates entropy-based methods for reducing hallucinations…
-
EvoDefense uses LLMs to co-evolve defenses against black-box attacks
Researchers have developed EvoDefense, a novel approach to protect large language models (LLMs) from attacks in black-box scenarios. This system uses a guard LLM and an experience memory to continuously refine defense s…
-
New Logit-Gap Steering method efficiently measures AI alignment robustness
Researchers have developed a new metric called the refusal-affirmation logit gap to quantify the safety margin of aligned language models. This metric, which measures the difference between refusal and affirmation token…
-
New diagnostic tool probes LLM circuits for safety and behavior insights
A new research paper introduces "Perturbation Probing," a diagnostic method for understanding the internal workings of large language models. This technique uses two forward passes per prompt to identify and analyze "be…