CHILLGuard: Towards Fine-Grained Chinese LLM Safety Guardrail with Scalable Data Construction and Model-aware Preference Alignment
Researchers have developed CHILLGuard, a novel safety guardrail specifically designed for Chinese Large Language Models (LLMs). This system addresses the limitations of existing guardrails by incorporating a fine-grained risk taxonomy tailored to Chinese regulatory policies and cultural nuances. To overcome the scarcity of relevant training data, a scalable multi-stage data construction pipeline was employed, resulting in a training set of over 400,000 samples and a test set of over 50,000 samples. Experiments show CHILLGuard significantly outperforms existing models, including Qwen3Guard-8B-Strict, by a notable margin. AI
IMPACT Enhances safety and regulatory compliance for Chinese LLMs, potentially enabling broader adoption in sensitive applications.