ENTITY GPT-5.4

GPT-5.4

PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

125

125 over 90d

Releases · 30d

1 over 90d

Papers · 30d

70 over 90d

TIER MIX · 90D

frontier release 2
significant 8
research 44
tool 60
commentary 11

TOPICS

product 74
paper 70
model release 63
safety 30
other 18
infra 14
opinion 2
funding 2

RELATIONSHIPS

subsidiary of OpenAI 100%
developed by OpenAI 100%
instance of large-language models 90%
used by codex 90%
developed by Microsoft Research 90%
competes with DeepSeek 80%
competes with Claude Opus-4.6 70%
competes with Gemini 3.1 Pro 70%
competes with Claude Sonnet 4.6 70%
authored by arXiv 70%
used by arXiv 70%
competes with Claude Opus 4.7 70%

TIMELINE

2026-05-26 research_milestone An evaluation found GPT-5.4 to be the only model that consistently improved code efficiency when prompted. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 6/7 · 125 TOTAL

RESEARCH · CL_11817 · May 1 · 04:00

GPT-5.4 leads LLMs in new EU digital battery passport conformance task

Researchers have introduced BatteryPass-12K, the first dataset designed for classifying digital battery passport conformance, in anticipation of the EU's upcoming battery regulation. They evaluated 22 language models, f…
RESEARCH · CL_11687 · May 1 · 04:00

AI agent swarms may fail due to 'Inverse-Wisdom Law,' study finds

A new paper introduces the Inverse-Wisdom Law, challenging the assumption that AI agent swarms benefit from the "Wisdom of the Crowd." The research demonstrates that these swarms can prioritize internal architectural ag…
RESEARCH · CL_11488 · Apr 30 · 15:01

New VeriGround model achieves reliable circuit-to-Verilog code generation

Researchers have identified a significant reliability issue in multimodal large language models (MLLMs) when generating hardware description language (HDL) code from circuit diagrams. This "Mirage" phenomenon occurs whe…
TOOL · CL_09121 · Apr 29 · 13:47

Lingo.dev launches v1.0 with AI-powered localization engine

Lingo.dev has launched version 1.0 of its localization platform, introducing retrieval augmented localization (RAL). This approach injects glossary context and brand voice rules into LLM requests to improve translation …
SIGNIFICANT · CL_08510 · Apr 29 · 04:10

AWS launches Amazon Quick, integrates OpenAI models into Bedrock

Amazon Web Services has launched Amazon Quick, an AI agent designed to integrate with local files, emails, and applications to streamline workflows. The company also announced a deeper partnership with OpenAI, bringing …
FRONTIER RELEASE · CL_08402 · Apr 29 · 00:52

Xiaomi open-sources MiMo-V2.5 AI models, showcasing macOS simulation and high token efficiency

Xiaomi has officially open-sourced its MiMo-V2.5 series of AI models, including the flagship MiMo-V2.5 Pro agent model. These models demonstrate strong performance, rivaling top closed-source models like Claude Opus 4.6…
FRONTIER RELEASE · CL_07657 · Apr 28 · 12:16

Xiaomi's MiMo-v2.5-Pro open-source model rivals top AI coding assistants

Xiaomi has released MiMo-v2.5-Pro, an open-source coding-focused language model that demonstrates impressive capabilities in complex tasks. The model successfully completed a university-level compiler project in hours, …
RESEARCH · CL_06722 · Apr 28 · 04:00

Frontier LLMs like GPT-5.4 and Claude Opus 4.7 show significant verbal tics

A new paper analyzes the prevalence of verbal tics, such as repetitive phrases and sycophantic openers, in eight leading large language models. Researchers developed a Verbal Tic Index (VTI) to quantify these tics, find…
RESEARCH · CL_08361 · Apr 27 · 23:48

Claude Opus 4.7 leads frontier agents in AI research acceleration benchmark

A new research paper proposes a benchmark to assess AI's ability to autonomously implement machine learning pipelines, aiming to detect early signs of recursive self-improvement. Frontier coding agents were tasked with …
RESEARCH · CL_04389 · Apr 26 · 20:01

GPT-5.4 and Claude Opus 4.6 fail banking benchmark, scoring 0% client-ready outputs

A new benchmark called BankerToolBench has revealed significant shortcomings in current large language models when applied to financial tasks. GPT-5.4, Claude Opus 4.6, and other models were tested on simulated junior i…
FRONTIER RELEASE · CL_03105 · Apr 25 · 05:00

DeepSeek releases V4 Pro and Flash models with 1M context, runs on Huawei chips

DeepSeek has released its new V4 family of models, including V4 Pro and V4 Flash, which boast a 1 million token context window. These models were trained on 32 trillion tokens and feature a novel hybrid attention system…
RESEARCH · CL_04994 · Apr 24 · 01:52

AI models show Western bias, homogenizing values across cultures

A new study auditing large language models found that three leading systems—Claude Sonnet 4.5, GPT-5.4, and Gemini 2.5 Flash—consistently provided individualistic advice, even when presented with dilemmas from users in …
RESEARCH · CL_02960 · Apr 23 · 12:36

Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

Researchers have developed a new framework called Verbal Process Supervision (VPS) that enhances the reasoning capabilities of large language models without requiring gradient updates. This method utilizes structured na…
RESEARCH · CL_02975 · Apr 23 · 07:02

AI models evaluated on meeting summaries, GPT-5.1 shows gains

Researchers have developed a reusable pipeline for evaluating AI-generated meeting summaries, designed to be adaptable across different domains. The system treats both ground truth and AI outputs as structured artifacts…
RESEARCH · CL_02999 · Apr 22 · 22:56

AI system enhances science classroom discourse analysis using multi-task learning

Researchers have developed an automated discourse analysis system (ADAS) to classify teacher and student utterances in science classrooms, aiming to understand knowledge construction and improve teaching. The system use…
FRONTIER RELEASE · CL_03443 · Apr 21 · 00:00

Moonshot AI's Kimi K2.6 tops benchmarks, Bezos eyes $10B AI fundraise

Moonshot AI has released Kimi K2.6, a model claiming superior performance on coding and agentic benchmarks, surpassing models like GPT-5.4 and Claude Opus 4.6. Alibaba's Qwen3.6-Max-Preview also shows improved instructi…
RESEARCH · CL_17282 · Apr 17 · 15:47

OpenAI releases GPT-5.4-Cyber for cybersecurity, contrasting with Anthropic's limited Claude Mythos

OpenAI has released GPT-5.4-Cyber, a specialized version of its GPT-5.4 model, aimed at enhancing cybersecurity defenses. This model, available through OpenAI's Trusted Access for Cyber program, offers capabilities like…
RESEARCH · CL_17452 · Apr 17 · 14:09

Public AI models replicate Anthropic's vulnerability discovery findings

Researchers have successfully replicated Anthropic's Mythos findings using publicly available AI models like GPT-5.4 and Claude Opus 4.6. This suggests that advanced AI capabilities for discovering software vulnerabilit…
SIGNIFICANT · CL_02143 · Apr 13 · 06:00

OpenAI powers enterprise AI adoption with Cloudflare and Hyatt integrations

OpenAI has partnered with Hyatt to integrate ChatGPT Enterprise across the hospitality company's global operations. This collaboration aims to enhance employee productivity by automating manual tasks, allowing staff to …
FRONTIER RELEASE · CL_11191 · Apr 8 · 16:00

RT Artificial Analysis: Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Cla...

Meta AI has released Muse Spark, a new frontier-class multimodal model developed by Meta Superintelligence Labs. This marks Meta's return to the frontier AI race after a period of relative quiet and is their first model…

GPT-5.4 leads LLMs in new EU digital battery passport conformance task

AI agent swarms may fail due to 'Inverse-Wisdom Law,' study finds

New VeriGround model achieves reliable circuit-to-Verilog code generation

Lingo.dev launches v1.0 with AI-powered localization engine

AWS launches Amazon Quick, integrates OpenAI models into Bedrock

Xiaomi open-sources MiMo-V2.5 AI models, showcasing macOS simulation and high token efficiency

Xiaomi's MiMo-v2.5-Pro open-source model rivals top AI coding assistants

Frontier LLMs like GPT-5.4 and Claude Opus 4.7 show significant verbal tics

Claude Opus 4.7 leads frontier agents in AI research acceleration benchmark

GPT-5.4 and Claude Opus 4.6 fail banking benchmark, scoring 0% client-ready outputs

DeepSeek releases V4 Pro and Flash models with 1M context, runs on Huawei chips

AI models show Western bias, homogenizing values across cultures

Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

AI models evaluated on meeting summaries, GPT-5.1 shows gains

AI system enhances science classroom discourse analysis using multi-task learning

Moonshot AI's Kimi K2.6 tops benchmarks, Bezos eyes $10B AI fundraise

OpenAI releases GPT-5.4-Cyber for cybersecurity, contrasting with Anthropic's limited Claude Mythos

Public AI models replicate Anthropic's vulnerability discovery findings

OpenAI powers enterprise AI adoption with Cloudflare and Hyatt integrations

RT Artificial Analysis: Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Cla...