English(EN) AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

NVIDIA发布Nemotron 3.5以实现多模态人工智能安全

作者 PulseAugur 编辑部 · [8 个来源] · 2026-06-01 17:36

NVIDIA发布了Nemotron 3.5 Content Safety，这是一款旨在识别和减轻文本及图像中有害内容的AI模型。新版本增强了多模态理解能力，支持超过140种语言并具有强大的零样本泛化能力，还允许根据特定企业需求定制策略执行。它还包括一个可审计的推理跟踪功能，并公开发布了其多模态安全数据集。 AI

影响通过可定制的多模态内容审核和推理能力，增强企业AI安全性。

排序理由 NVIDIA发布Nemotron 3.5，这是其内容安全模型的新版本，具有增强的多模态和多语言功能，构成了一次模型发布。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 8 个来源。我们如何撰写摘要 →

报道来源 [8]

Hugging Face Blog TIER_1 English(EN) · 2026-06-04 18:57

Nemotron 3.5 内容安全：面向全球企业的可定制多模态安全
arXiv cs.CL TIER_1 English(EN) · David Gringras · 2026-06-05 04:00

IatroBench：AI安全措施造成医源性伤害的预注册证据

arXiv:2604.07709v4 Announce Type: replace-cross Abstract: A heavily safety-trained model will hand a physician the full, patient-followable benzodiazepine taper and refuse it to the patient who needs it, over identical clinical facts; the knowledge is present either way. IatroBen…
arXiv cs.AI TIER_1 English(EN) · Yanjing Ren, Reza Ebrahimi, TengTeng Ma · 2026-06-04 04:00

AICompanionBench：为AI伴侣安全性基准测试LLMs-as-Judges

arXiv:2606.04867v1 Announce Type: new Abstract: As AI companion platforms such as Replika and Character.AI rapidly grow, concerns about unsafe human-AI interactions have intensified. This study introduces AICompanionBench, to our knowledge the first publicly available benchmark d…
arXiv cs.AI TIER_1 English(EN) · TengTeng Ma · 2026-06-03 13:33

AICompanionBench：为AI伴侣安全性对LLMs-as-Judges进行基准测试

As AI companion platforms such as Replika and Character.AI rapidly grow, concerns about unsafe human-AI interactions have intensified. This study introduces AICompanionBench, to our knowledge the first publicly available benchmark dataset of human-AI companion conversations annot…
LessWrong (AI tag) TIER_1 English(EN) · draganover · 2026-06-05 16:27

从零开始组建 AI 安全研究团队的经验教训

This post’s goal is to distill our takeaways from building a new research team over the past four months. We describe some context about our team, how it came about, and then describe the lessons learned.<a href="https://forum.effectivealtruism.org/posts/rA…
LessWrong (AI tag) TIER_1 English(EN) · Austin Chen · 2026-06-03 21:50

人工智能安全的十六项方案

These days, I often run across <a href="https://generatorresidency.org/">whippersnappers</a> excited to do something for AI safety — but aren’t quite sure what. One of the fun things about the Future Fund era wer…
LessWrong (AI tag) TIER_1 English(EN) · MichaelDickens · 2026-06-01 17:36

我们需要广度优先的AI安全计划

Cross-posted from <a href="https://mdickens.me/2026/06/01/breadth-first_AI_safety_plans/">my website</a>. Depth-first plans lay out a path from here to aligned superintelligent AI. We need those kinds of plans. But depth-first plans depend on m…
dev.to — LLM tag TIER_1 English(EN) · soy · 2026-06-06 21:33

本地模型编排、个人AI基础设施与多模态安全

<h2> Local Models Orchestration, Personal AI Infrastructure & Multimodal Safety </h2> <h3> Today's Highlights </h3> This week features practical guides for orchestrating small, open-weight models for complex tasks, a trending GitHub project for building self-hosted persona…

报道来源 [8]

相关实体

相关话题