PulseAugur
实时 23:32:00
English(EN) A small amount of this data produced broad gains beyond the training scenarios.

OpenAI 训练 AI 模型以实现跨领域的持续有益行为

OpenAI 发布了一项关于新方法的 ist 研究,该方法旨在训练 AI 模型在各种情况下并在对抗压力下保持有益的特质。这种方法称为有益强化学习 (Beneficial RL),在现实对话中使用强化学习来灌输真诚、谦逊和公平等品质。早期测试表明,通过此方法训练的模型在各种领域(即使是那些未明确包含在训练数据中的领域)都显示出更好的对齐和安全性,并能更好地抵御有害提示。 AI

影响 这项研究可能带来更可靠、更值得信赖的 AI 系统,使其能够在新颖和具有挑战性的场景中保持安全和有益的行为。

排序理由 OpenAI 关于新 AI 训练方法的 ist 研究论文。

在 X — OpenAI 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

OpenAI 训练 AI 模型以实现跨领域的持续有益行为

报道来源 [6]

  1. X — OpenAI TIER_1 English(EN) · OpenAI ·

    This is an early step toward more robustly beneficial and aligned models: training models to carry beneficial traits into new situations, so as AI becomes more

    This is an early step toward more robustly beneficial and aligned models: training models to carry beneficial traits into new situations, so as AI becomes more capable, it also becomes more reliable, transparent, and helpful for people.

  2. X — OpenAI TIER_1 English(EN) · OpenAI ·

    We also tested whether alignment persisted under pressure.

    We also tested whether alignment persisted under pressure. The model was harder to steer toward harmful behavior with adversarial prompts, while remaining responsive to helpful instructions. We saw preliminary evidence of greater resistance to harmful fine-tuning. https://t.co…

  3. X — OpenAI TIER_1 English(EN) · OpenAI ·

    The most interesting test was cross-domain transfer.

    The most interesting test was cross-domain transfer. When beneficial behavior training was limited to health conversations, the model still improved on non-health evaluations of misalignment, deception, and reward hacking—even though those tasks looked very different from the ht…

  4. X — OpenAI TIER_1 English(EN) · OpenAI ·

    A small amount of this data produced broad gains beyond the training scenarios.

    A small amount of this data produced broad gains beyond the training scenarios. Compared with a compute-matched baseline, the trained model improved on 44 of 53 independent evaluations of alignment and benefits, spanning deception, reward hacking, safety, health, and mental http…

  5. X — OpenAI TIER_1 English(EN) · OpenAI ·

    We trained models with reinforcement learning on realistic conversations to reinforce beneficial traits like truthfulness, humility under uncertainty, openness

    We trained models with reinforcement learning on realistic conversations to reinforce beneficial traits like truthfulness, humility under uncertainty, openness to correction, fairness, and concern for human welfare, across 12 domains, including health, science, and education. htt…

  6. X — OpenAI TIER_1 English(EN) · OpenAI ·

    As AI takes on longer, higher-stakes tasks, we want models to carry beneficial and safe behavior into new domains beyond their training—and maintain it under pr

    As AI takes on longer, higher-stakes tasks, we want models to carry beneficial and safe behavior into new domains beyond their training—and maintain it under pressure. That’s the idea behind our new research on training models to be broadly and persistently beneficial.