PulseAugur
LIVE 08:33:53
frontier release · [12 sources] ·
0
frontier release

OpenAI releases o3 and o4-mini models with advanced reasoning and tool capabilities

OpenAI has released its new o3 and o4-mini models, which represent a significant advancement in reasoning capabilities and tool integration within ChatGPT. The o3 model is positioned as OpenAI's most powerful reasoning model, excelling in complex tasks across coding, math, science, and visual perception, setting new state-of-the-art benchmarks. The o4-mini model offers a more cost-efficient option with remarkable performance, particularly in math and coding, and is optimized for speed and high throughput. Additionally, OpenAI has introduced specialized agents: Operator, an agentic model for web-based tasks, and Codex, a cloud-based coding agent powered by a version of o3 optimized for software engineering. AI

Summary written by gemini-2.5-flash-lite from 12 sources. How we write summaries →

RANK_REASON This cluster details the release of new frontier models (o3, o4-mini) and specialized agentic models (Operator, Codex) by a tier-1 lab (OpenAI), along with associated system cards and evaluation results.

Read on Smol AINews →

COVERAGE [12]

  1. OpenAI News TIER_1 Português(PT) ·

    Addendum to OpenAI o3 and o4-mini system card: OpenAI o3 Operator

    We are replacing the existing GPT-4o-based model for Operator with a version based on OpenAI o3. The API version will remain based on 4o.

  2. OpenAI News TIER_1 Português(PT) ·

    Addendum to o3 and o4-mini system card: Codex

    Codex is a cloud-based coding agent. Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering. codex-1 was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style …

  3. OpenAI News TIER_1 ·

    OpenAI o3 and o4-mini System Card

    OpenAI o3 and OpenAI o4-mini combine state-of-the-art reasoning with full tool capabilities—web browsing, Python, image and file analysis, image generation, canvas, automations, file search, and memory.

  4. OpenAI News TIER_1 ·

    Introducing OpenAI o3 and o4-mini

    Our smartest and most capable models to date with full tool access

  5. OpenAI News TIER_1 ·

    OpenAI o3-mini System Card

    This report outlines the safety work carried out for the OpenAI o3-mini model, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

  6. OpenAI News TIER_1 ·

    OpenAI o3-mini

  7. OpenAI News TIER_1 ·

    OpenAI o1 System Card

    This report outlines the safety work carried out prior to releasing OpenAI o1 and o1-mini, including external red teaming and frontier risk evaluations according to our Preparedness Framework.

  8. OpenAI News TIER_1 ·

    OpenAI o1-mini

    Advancing cost-efficient reasoning

  9. METR (Model Evaluation & Threat Research) TIER_1 Română(RO) ·

    OpenAI o3 and o4-mini Evaluation Results

  10. METR (Model Evaluation & Threat Research) TIER_1 Română(RO) ·

    OpenAI o3 and o4-mini Evaluation Results

    <h2 id="executive-summary">Executive Summary</h2> <p>As described in OpenAI’s o3 System Card, METR received access to earlier checkpoints of o3 and o4-mini from OpenAI three weeks prior to model release. To assist with our evaluations, OpenAI also provided us wi…

  11. METR (Model Evaluation & Threat Research) TIER_1 ·

    Details about METR’s preliminary evaluation of o1-preview

    <p>METR received access to OpenAI o1-mini on August 28th and to OpenAI o1-preview on September 3rd, and evaluated these models for autonomous capabilities and AI R&D ability until September 9th. The tasks used in the general autonomy evaluations were run similarly t…

  12. Smol AINews TIER_1 ·

    OpenAI o3, o4-mini, and Codex CLI

    **OpenAI** launched the **o3** and **o4-mini** models, emphasizing improvements in **reinforcement-learning scaling** and overall efficiency, making **o4-mini** cheaper and better across prioritized metrics. These models showcase enhanced **vision** and **tool use** capabilities,…