Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

CHILLGuard: Towards Fine-Grained Chinese LLM Safety Guardrail with Scalable Data Construction and Model-aware Preference Alignment

Researchers have developed CHILLGuard, a novel safety guardrail specifically designed for Chinese Large Language Models (LLMs). This system addresses the limitations of existing guardrails by incorporating a fine-grained risk taxonomy tailored to Chinese regulatory policies and cultural nuances. To overcome the scarcity of relevant training data, a scalable multi-stage data construction pipeline was employed, resulting in a training set of over 400,000 samples and a test set of over 50,000 samples. Experiments show CHILLGuard significantly outperforms existing models, including Qwen3Guard-8B-Strict, by a notable margin. AI

IMPACT Enhances safety and regulatory compliance for Chinese LLMs, potentially enabling broader adoption in sensitive applications.

Hugging Face
arXiv
CHILLGuard
Chinese LLM
Model-aware Direct Preference Optimization
Qwen3Guard-8B-Strict