新的ToxiREX数据集解决了六种语言中的隐式毒性问题

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-26 11:30

研究人员推出ToxiREX，一个旨在捕捉在线对话中隐式和上下文相关毒性的多语言新数据集。该数据集包含Reddit评论线程，使用结构化的毒性推理模式进行标注，并包含六种语言的内容。ToxiREX旨在通过考虑对话上下文来提供对毒性更细致的理解，这是以前的数据集中不存在的特征。初步实验表明，虽然语言模型在此任务上的表现优于随机猜测，但仍需显著改进。 AI

影响该数据集可以通过更好地检测细微和上下文相关的有毒语言来改善LLM的安全性。

排序理由该集群描述了一个新的学术数据集和相关的研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Stefan F. Schouten, Ilia Markov, Piek Vossen · 2026-06-29 04:00

ToxiREX: A Dataset on Toxic REasoning in ConteXt

arXiv:2606.27981v1 Announce Type: new Abstract: We introduce a new, contextual, multilingual dataset called ToxiREX: Toxic REasoning in ConteXt. The dataset consists of threads of Reddit comments and structured characterizations of what the comments imply, following a systematic …
arXiv cs.CL TIER_1 English(EN) · Piek Vossen · 2026-06-26 11:30

ToxiREX: A Dataset on Toxic REasoning in ConteXt

We introduce a new, contextual, multilingual dataset called ToxiREX: Toxic REasoning in ConteXt. The dataset consists of threads of Reddit comments and structured characterizations of what the comments imply, following a systematic toxic reasoning schema developed in a previous p…

报道来源 [2]

ToxiREX: A Dataset on Toxic REasoning in ConteXt

ToxiREX: A Dataset on Toxic REasoning in ConteXt

相关实体

相关话题