PulseAugur
实时 21:44:56
English(EN) Traditional evaluations and red-teaming remain essential, especially for rare or severe risks.

OpenAI发布部署模拟以预测AI模型行为

OpenAI开发了一种名为部署模拟的新方法,用于预测AI模型在发布前在真实世界场景中的行为。该技术使用去标识化的用户数据来模拟部署条件,在各种类别和GPT-5系列模型中与观察到的行为显示出很强的相关性。虽然传统评估仍然至关重要,但这种模拟方法旨在估计不良行为的频率并在部署前识别新问题。 AI

影响 这种模拟方法可以通过在模型广泛部署前识别潜在问题来提高AI安全性。

排序理由 OpenAI正在分享关于在模型发布前预测模型行为的新研究。

在 X — OpenAI 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

OpenAI发布部署模拟以预测AI模型行为

报道来源 [6]

  1. X — OpenAI TIER_1 English(EN) · OpenAI ·

    Deployment Simulation works best with representative production data, which external evaluators often can’t access.

    Deployment Simulation works best with representative production data, which external evaluators often can’t access. In a companion post for our Alignment blog, we also explore the public WildChat dataset and find that, while less precise, it still provides a useful signal about

  2. X — OpenAI TIER_1 English(EN) · OpenAI ·

    Simulated deployments also reduced evaluation awareness to levels close to real production traffic.

    Simulated deployments also reduced evaluation awareness to levels close to real production traffic. We extended the method to agentic deployments with stateful tools, showing that tool simulators can produce realistic trajectories when given sufficient context and capabilities.…

  3. X — OpenAI TIER_1 English(EN) · OpenAI ·

    Across 20 behavior categories and three GPT-5-series Thinking deployments, simulated and observed rates were strongly correlated.

    Across 20 behavior categories and three GPT-5-series Thinking deployments, simulated and observed rates were strongly correlated. The method outperformed challenging-prompt and previous-deployment baselines at predicting whether rates would rise or fall—and by how much. https:/…

  4. X — OpenAI TIER_1 English(EN) · OpenAI ·

    Traditional evaluations and red-teaming remain essential, especially for rare or severe risks.

    Traditional evaluations and red-teaming remain essential, especially for rare or severe risks. Deployment Simulation complements them by helping us estimate how often undesired behaviors may occur in realistic use and surface new behaviors before release.

  5. X — OpenAI TIER_1 English(EN) · OpenAI ·

    For this research, we analyzed only ChatGPT conversations from users who allow their data to be used to improve models.

    For this research, we analyzed only ChatGPT conversations from users who allow their data to be used to improve models. Before analysis, we removed account-linked identifiers and identifiable information, and we report only aggregate findings. https://t.co/zF14BHFgKw

  6. X — OpenAI TIER_1 English(EN) · OpenAI ·

    We’re sharing new research on a method for anticipating how models may behave in real-world use before release: simulating deployment with recent, de-identified

    We’re sharing new research on a method for anticipating how models may behave in real-world use before release: simulating deployment with recent, de-identified user requests and studying candidate model responses. https://t.co/7RJzBfNniQ