English(EN) DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

DriftGuard框架通过安全感知漂移检测改进毒性内容审核

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-30 04:00

研究人员开发了DriftGuard，一个旨在增强自动化毒性内容审核系统鲁棒性的新框架。该系统采用安全感知多监控器漂移检测来识别演进中的有害行为，包括传统方法可能忽略的隐晦语言和目标人群的变化。当检测到重大变化时，DriftGuard会使用优先适应集选择性地更新审核模型，重点关注可能的假阴性和高风险示例。实验表明，与Civil Comments和DynaHate等数据集上的基线方法相比，DriftGuard显著提高了毒性召回率和准确性。 AI

影响增强了在动态在线环境中用于内容审核的AI系统的鲁棒性和适应性。

排序理由这是一篇详细介绍毒性内容审核新框架的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Yuting Xin, Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu · 2026-06-30 04:00

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

arXiv:2606.28725v1 Announce Type: new Abstract: Automated toxicity moderation systems operate in dynamic online environments where harmful behavior evolves through coded language, shifting targets, and strategic adaptation to enforcement. Existing drift detection methods often fo…

报道来源 [1]

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

相关实体

相关话题