WildGuard
PulseAugur coverage of WildGuard — every cluster mentioning WildGuard across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
Google's AMS tool finds critical safety flaws in three tested LLMs
Google Cloud has open-sourced AMS (Activation Model Scanner), a tool that analyzes the geometric structure of a model's activation space to verify safety training. Unlike traditional behavioral tests, AMS directly inspe…
-
New Opir models offer efficient multi-task safety classification for LLMs
Researchers have introduced Opir, a new family of encoder-based guardrail models designed for efficient multi-task safety classification in large language model applications. Opir models are built on the GLiClass archit…
-
GLiNER Guard unifies LLM safety and PII detection in single pass
A new system called GLiNER Guard (GLiGuard) has been developed to streamline safety moderation and PII detection for large language models. This unified encoder collapses multiple classifiers and NER models into a singl…
-
Fastino Labs open-sources GLiGuard safety model
Fastino Labs has released GLiGuard, an open-source safety moderation model designed to be significantly faster and more efficient than existing solutions. Unlike traditional decoder-only models that generate responses t…
-
AI safety models vulnerable to fine-tuning and embedding bypass attacks
Two new research papers explore vulnerabilities in AI safety mechanisms. The first paper, "When Safety Geometry Collapses," demonstrates how fine-tuning even benign guard models can inadvertently destroy their safety al…