LlamaGuard
PulseAugur coverage of LlamaGuard — every cluster mentioning LlamaGuard across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
Google's AMS tool finds critical safety flaws in three tested LLMs
Google Cloud has open-sourced AMS (Activation Model Scanner), a tool that analyzes the geometric structure of a model's activation space to verify safety training. Unlike traditional behavioral tests, AMS directly inspe…
-
LLM Agents Vulnerable to Tool-Output Injection Attacks
LLM agents possess a significant security vulnerability where malicious code can be injected through the outputs of tools they utilize. This 'tool-output injection' bypasses standard input and output guardrails because …
-
AI safety models vulnerable to fine-tuning and embedding bypass attacks
Two new research papers explore vulnerabilities in AI safety mechanisms. The first paper, "When Safety Geometry Collapses," demonstrates how fine-tuning even benign guard models can inadvertently destroy their safety al…
-
New proxy tool blocks prompt injection attacks on AI models
A new tool called Arc Gate has been developed to act as a proxy, sitting in front of any OpenAI-compatible endpoint. This proxy is designed to effectively block prompt injection attacks before they can reach the underly…