OpenAI has introduced its latest visual reasoning models, o3 and o4-mini, which allow AI to "think with images" as part of its internal reasoning process. These models can perform image manipulations like cropping and zooming natively, enhancing ChatGPT's ability to analyze complex visual data. This advancement leads to state-of-the-art performance on multimodal benchmarks, particularly in STEM question-answering and visual search, marking a significant step towards more capable multimodal AI agents. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON Frontier-lab model release with system card.