A new research paper evaluates 14 open-source safety guard models using a benchmark of over 79,000 samples across eight safety categories. The study found that model size does not correlate with safety detection performance, and surprisingly, a smaller model, Qwen Guard (4B parameters), achieved the highest recall at 83.97%. Larger models like Llama Guard and GPT-OSS Safeguard missed a significant portion of unsafe content, highlighting recall as a critical metric for safety applications. AI
IMPACT Highlights that smaller, specialized models can outperform larger general-purpose ones in safety detection, guiding practical selection for production deployments.
RANK_REASON The cluster contains an academic paper evaluating open-source models. [lever_c_demoted from research: ic=1 ai=1.0]
- BeaverTails
- GPT-OSS Safeguard
- HarmBench
- Llama Guard
- NIST AI Risk Framework
- Qwen Guard
- RealToxicityPrompts
- StrongREJECT
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →