ENTITY GPT-5.4

GPT-5.4

PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

125

125 over 90d

Releases · 30d

1 over 90d

Papers · 30d

70 over 90d

TIER MIX · 90D

frontier release 2
significant 8
research 44
tool 60
commentary 11

TOPICS

product 74
paper 70
model release 63
safety 30
other 18
infra 14
opinion 2
funding 2

RELATIONSHIPS

subsidiary of OpenAI 100%
developed by OpenAI 100%
instance of large-language models 90%
used by codex 90%
developed by Microsoft Research 90%
competes with DeepSeek 80%
competes with Claude Opus-4.6 70%
competes with Gemini 3.1 Pro 70%
competes with Claude Sonnet 4.6 70%
authored by arXiv 70%
used by arXiv 70%
competes with Claude Opus 4.7 70%

TIMELINE

2026-05-26 research_milestone An evaluation found GPT-5.4 to be the only model that consistently improved code efficiency when prompted. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 4/7 · 125 TOTAL

SIGNIFICANT · CL_42398 · May 21 · 08:36

Alibaba's Qwen 3.6 open-weight model rivals frontier AI on coding tasks

Alibaba's Qwen 3.6 model family, particularly the 27B dense variant, has demonstrated performance competitive with leading frontier models like GPT-5.4 and Claude 4.6 on coding tasks. This open-weight model, runnable on…
RESEARCH · CL_43929 · May 21 · 00:00

AI models fail to reliably forecast scientific progress, study finds

A new benchmark called CUSP has been developed to evaluate AI's ability to forecast scientific progress. The study found that current frontier AI models struggle with predicting the realization and timing of scientific …
RESEARCH · CL_41768 · May 20 · 08:37

Microsoft Security Copilot uses AI agent for autonomous threat detection

Microsoft has developed a Dynamic Threat Detection Agent (DTDA) integrated into its Security Copilot, designed to autonomously investigate security incidents and generate novel alerts. This agent utilizes a unified acti…
RESEARCH · CL_49708 · May 20 · 08:15

New attack method enhances adversarial transferability in MLLMs

Researchers have developed FRA-Attack, a novel method to improve the transferability of adversarial attacks against multimodal large language models (MLLMs). This technique utilizes frequency-domain regularization to al…
COMMENTARY · CL_40060 · May 20 · 02:31

Developer finds Claude Code Extension optimal for AI-assisted coding

A software developer details their journey to find the optimal AI coding assistant, ultimately settling on VS Code with the Claude Code Extension and a MAX plan. They found that while tools like GitHub Copilot and Curso…
RESEARCH · CL_41837 · May 19 · 21:23

LLMs struggle to simulate real human behavior, new research shows

Two new research papers explore the limitations of current large language models in simulating realistic human behavior. The first paper, "OmniBehavior," introduces a benchmark using real-world data and finds that LLMs …
TOOL · CL_39521 · May 19 · 13:53

Databricks launches beta Unity AI Gateway Guardrails for AI security

Databricks has launched a beta version of its Unity AI Gateway Guardrails, designed to enhance the security and compliance of AI applications. These guardrails help prevent sensitive data leakage, protect against malici…
TOOL · CL_40813 · May 19 · 12:59

LLMs generate gendered behaviors, impacting trust calibration in agents

Researchers have developed a method to generate multimodal behaviors for socially interactive agents, aiming to calibrate user trust based on an agent's capabilities and benevolence. The study utilized GPT-5.4 to produc…
SIGNIFICANT · CL_38042 · May 19 · 02:46

Alibaba Qwen 3.7 previews top Chinese models in text and vision benchmarks

Alibaba's Qwen team has released preview versions of its Qwen 3.7 Max and Qwen 3.7 Plus models, showcasing rapid iteration cycles. The Qwen 3.7 Max model has achieved top rankings among Chinese models in text-based benc…
TOOL · CL_49340 · May 18 · 22:20

AI agents struggle with research rigor despite generating papers

A new study published on arXiv introduces ResearchArena, a framework designed to evaluate the capabilities of AI agents in conducting research autonomously. The system allowed agents like Claude Code, Codex, and Kimi Co…
TOOL · CL_37440 · May 18 · 16:43

Cursor launches Composer 2.5 AI coding assistant with enhanced intelligence

Cursor has released Composer 2.5, an updated AI coding assistant that offers improved intelligence and reliability for long-running tasks. This new version is built upon Moonshot AI's Kimi K2.5 architecture and incorpor…
RESEARCH · CL_37949 · May 18 · 10:19

AI systems take top spots in EgoVis 2026 challenges

Two research teams have presented technical reports for challenges at the EgoVis 2026 conference. One team, JFAA, secured first place in the EPIC-KITCHENS-100 Action Anticipation Challenge using a JEPA-based method for …
FRONTIER RELEASE · CL_34433 · May 16 · 11:51

DeepSeek V4 launches with 1.6T MoE, 1M context, and lower costs

DeepSeek V4, an open-weight model family, has been released with a 1.6-trillion-parameter Mixture-of-Experts architecture that activates only 49 billion parameters per token. This new model boasts a 1-million-token cont…
COMMENTARY · CL_34131 · May 16 · 05:25

Open-weight AI models cost developers fraction of traditional inference

A developer detailed their experience using open-weight AI models for a coding project, incurring a cost of only $5 for over 400 million tokens via a subscription service. This contrasts sharply with the estimated $138.…
TOOL · CL_29625 · May 13 · 04:08

New benchmark tests AI agents on complex, iterative engineering tasks

A new benchmark, Frontier-Eng Bench, has been released to evaluate AI agents on complex engineering tasks that lack standardized answers. This benchmark moves beyond simple problem-solving by requiring agents to propose…
TOOL · CL_29240 · May 12 · 17:59

New benchmark CUActSpot targets complex interactions for AI agents

Researchers have introduced CUActSpot, a new benchmark designed to evaluate computer-use agents (CUAs) on complex and infrequent interactions across multiple modalities. The benchmark addresses the long-tail issue in GU…
TOOL · CL_28849 · May 12 · 17:01

No single AI model leads all benchmarks, report finds

A new report indicates that no single AI model consistently leads across all benchmarks, with different models excelling in specific areas like coding or math. The evaluation process itself is also complex, as multiple …
TOOL · CL_29373 · May 12 · 16:34

AI models fail to detect danger in long transcripts

A new paper reveals that leading AI models like Opus 4.6, GPT 5.4, and Gemini 3.1 exhibit significant performance degradation when classifying long transcripts, a crucial task for monitoring coding agents. These models …
RESEARCH · CL_29382 · May 12 · 08:39

LLMs evaluated for air traffic safety analysis

Researchers are exploring the use of large language models (LLMs) for enhancing safety in air traffic control (ATC) and around non-towered airports. One study proposes a vision-language model approach to analyze radio c…
RESEARCH · CL_36786 · May 11 · 23:15

Microsoft Research: LLMs corrupt 25% of documents in delegated tasks

A new benchmark, DELEGATE-52, developed by Microsoft Research, reveals that current large language models significantly corrupt documents during delegated workflows. Even advanced models like Gemini 3.1 Pro, Claude 4.6 …

Alibaba's Qwen 3.6 open-weight model rivals frontier AI on coding tasks

AI models fail to reliably forecast scientific progress, study finds

Microsoft Security Copilot uses AI agent for autonomous threat detection

New attack method enhances adversarial transferability in MLLMs

Developer finds Claude Code Extension optimal for AI-assisted coding

LLMs struggle to simulate real human behavior, new research shows

Databricks launches beta Unity AI Gateway Guardrails for AI security

LLMs generate gendered behaviors, impacting trust calibration in agents

Alibaba Qwen 3.7 previews top Chinese models in text and vision benchmarks

AI agents struggle with research rigor despite generating papers

Cursor launches Composer 2.5 AI coding assistant with enhanced intelligence

AI systems take top spots in EgoVis 2026 challenges

DeepSeek V4 launches with 1.6T MoE, 1M context, and lower costs

Open-weight AI models cost developers fraction of traditional inference

New benchmark tests AI agents on complex, iterative engineering tasks

New benchmark CUActSpot targets complex interactions for AI agents

No single AI model leads all benchmarks, report finds

AI models fail to detect danger in long transcripts

LLMs evaluated for air traffic safety analysis

Microsoft Research: LLMs corrupt 25% of documents in delegated tasks