Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · Towards AI English(EN) · 6d

Production-Level LLM Safety and Privacy Guardrails Family: GLiNER Guard (GLiGuard)

A new system called GLiNER Guard (GLiGuard) has been developed to streamline safety moderation and PII detection for large language models. This unified encoder collapses multiple classifiers and NER models into a single forward pass, significantly reducing processing time and cost compared to existing autoregressive or fragmented encoder approaches. GLiGuard's schema-driven interface allows for dynamic policy changes without retraining, making it a more efficient solution for production LLM applications. AI

IMPACT Streamlines LLM safety and PII detection, reducing operational costs and improving efficiency for production deployments.
- OpenAI
- NVIDIA
- ShieldGemma
- Presidio
- WildGuard
- GLiNER2
- GLiGuard
- mmBERT-small
- GPT OSS Safeguard
- PromptGuard
- GLiNER2 Multi
- GLiNER Guard
- Llama-Guard
- Toxic-Bert
TOOL · arXiv cs.AI English(EN) · 6d

Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes

Researchers have developed a new method called CPD Online to detect adversarial prompts that attempt to jailbreak large language models. This technique treats prompt detection as an online change-point detection problem, analyzing sequential entropy changes in the model's token predictions. CPD Online is model-agnostic, requires no training, and can pinpoint the onset of malicious prompts, outperforming existing perplexity-based detectors on various open-weight models. AI

IMPACT This new detection method could enhance the safety of LLMs by identifying and mitigating malicious prompts, potentially reducing the need for extensive guardrail interventions.

Brief

Production-Level LLM Safety and Privacy Guardrails Family: GLiNER Guard (GLiGuard)

Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes