PulseAugur
实时 00:39:29
English(EN) Traditional evaluations and red-teaming remain essential, especially for rare or severe risks.

OpenAI发布部署模拟以预测AI模型行为

OpenAI开发了一种名为部署模拟的新方法,用于预测AI模型在发布前在真实世界场景中的行为。该技术使用去标识化的用户数据来模拟部署条件,在各种类别和GPT-5系列模型中与观察到的行为显示出很强的相关性。虽然传统评估仍然至关重要,但这种模拟方法旨在估计不良行为的频率并在部署前识别新问题。 AI

影响 这种模拟方法可以通过在模型广泛部署前识别潜在问题来提高AI安全性。

排序理由 OpenAI正在分享关于在模型发布前预测模型行为的新研究。

在 X — OpenAI 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

OpenAI发布部署模拟以预测AI模型行为

报道来源 [6]

  1. X — OpenAI TIER_1 English(EN) · OpenAI ·

    部署模拟使用代表性的生产数据效果最佳,而外部评估者通常无法访问这些数据。

    Deployment Simulation works best with representative production data, which external evaluators often can’t access. In a companion post for our Alignment blog, we also explore the public WildChat dataset and find that, while less precise, it still provides a useful signal about

  2. X — OpenAI TIER_1 English(EN) · OpenAI ·

    模拟部署也将评估意识降低到接近真实生产流量的水平。

    Simulated deployments also reduced evaluation awareness to levels close to real production traffic. We extended the method to agentic deployments with stateful tools, showing that tool simulators can produce realistic trajectories when given sufficient context and capabilities.…

  3. X — OpenAI TIER_1 English(EN) · OpenAI ·

    在 20 个行为类别和三个 GPT-5 系列思维部署中,模拟和观察到的比率高度相关。

    Across 20 behavior categories and three GPT-5-series Thinking deployments, simulated and observed rates were strongly correlated. The method outperformed challenging-prompt and previous-deployment baselines at predicting whether rates would rise or fall—and by how much. https:/…

  4. X — OpenAI TIER_1 English(EN) · OpenAI ·

    传统评估和红队测试仍然至关重要,特别是对于罕见或严重风险。

    Traditional evaluations and red-teaming remain essential, especially for rare or severe risks. Deployment Simulation complements them by helping us estimate how often undesired behaviors may occur in realistic use and surface new behaviors before release.

  5. X — OpenAI TIER_1 English(EN) · OpenAI ·

    在此项研究中,我们仅分析了允许其数据用于改进模型训练的用户与ChatGPT的对话。

    For this research, we analyzed only ChatGPT conversations from users who allow their data to be used to improve models. Before analysis, we removed account-linked identifiers and identifiable information, and we report only aggregate findings. https://t.co/zF14BHFgKw

  6. X — OpenAI TIER_1 English(EN) · OpenAI ·

    我们正在分享一项新研究,介绍一种在模型发布前预测其在实际使用中行为的方法:使用最近的、已去标识化的数据进行模拟部署

    We’re sharing new research on a method for anticipating how models may behave in real-world use before release: simulating deployment with recent, de-identified user requests and studying candidate model responses. https://t.co/7RJzBfNniQ