PulseAugur
实时 15:13:57
English(EN) A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

新研究解决多语言LLM毒性检测与缓解问题

两篇新研究论文探讨了在大型语言模型(LLM)中检测和缓解毒性的方法,特别关注多语言环境。第一篇论文调查了跨不同语言识别和减少有害输出的现有策略,强调了语言覆盖不均和有害定义具有文化特异性等挑战。第二篇论文介绍了ToxSearch-S,一种分布式进化搜索算法,旨在寻找引发毒性响应的对抗性提示,并通过MPI实现和改进的毒性检测与现有方法相比,展示了效率的提升。 AI

影响 这些在毒性检测和缓解方面的进展可能有助于在不同语言社区中更安全、更可靠地部署LLM。

排序理由 两篇在arXiv上发表的学术论文,详细介绍了LLM安全研究的新方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新研究解决多语言LLM毒性检测与缓解问题

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Soham Dan, Himanshu Beniwal, Thomas Hartvigsen ·

    A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

    arXiv:2606.25380v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed across languages, but their safety behavior remains uneven across linguistic and cultural contexts. This survey synthesizes work on toxicity detection and detoxification for mul…

  2. arXiv cs.CL TIER_1 English(EN) · Thomas Hartvigsen ·

    A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

    Large language models (LLMs) are increasingly deployed across languages, but their safety behavior remains uneven across linguistic and cultural contexts. This survey synthesizes work on toxicity detection and detoxification for multilingual LLMs. We first catalogue threat models…

  3. arXiv cs.NE (Neural & Evolutionary) TIER_1 English(EN) · Travis Desell ·

    Distributed Quality-Diversity Search for Toxicity in Large Language Models

    Large Language Models remain vulnerable to adversarial prompts that elicit harmful responses, and scaling red-teaming to cover a broad range of failure modes is constrained by the cost of text generation and evaluation. We present \emph{ToxSearch-S}, a speciated extension of toxi…