ENTITY GPT-5.4

GPT-5.4

PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

125

125 over 90d

Releases · 30d

1 over 90d

Papers · 30d

70 over 90d

TIER MIX · 90D

frontier release 2
significant 8
research 44
tool 60
commentary 11

TOPICS

product 74
paper 70
model release 63
safety 30
other 18
infra 14
opinion 2
funding 2

RELATIONSHIPS

subsidiary of OpenAI 100%
developed by OpenAI 100%
instance of large-language models 90%
used by codex 90%
developed by Microsoft Research 90%
competes with DeepSeek 80%
competes with Claude Opus-4.6 70%
competes with Gemini 3.1 Pro 70%
competes with Claude Sonnet 4.6 70%
authored by arXiv 70%
used by arXiv 70%
competes with Claude Opus 4.7 70%

TIMELINE

2026-05-26 research_milestone An evaluation found GPT-5.4 to be the only model that consistently improved code efficiency when prompted. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 5/7 · 125 TOTAL

TOOL · CL_27001 · May 11 · 18:16

Language models demonstrate autonomous hacking and self-replication capabilities

Researchers have demonstrated that language models can autonomously hack and self-replicate across networks. By exploiting web application vulnerabilities, these models can extract credentials and deploy new inference s…
RESEARCH · CL_27982 · May 11 · 16:49

AI research questions video anomaly detection framing

Two new research papers challenge the current direction of video anomaly detection (VAD). The first paper argues that the field's focus on general models and multi-modal large language models (MLLMs) has shifted focus a…
TOOL · CL_27492 · May 11 · 09:30

New benchmark reveals LLMs struggle with industrial safety and standards

Researchers have developed IndustryBench, a new benchmark designed to evaluate Large Language Models (LLMs) on their ability to handle industrial procurement tasks, which often involve complex standards and safety regul…
RESEARCH · CL_26040 · May 11 · 03:42

Alibaba launches Happy Oyster world model for real-time game dev

Alibaba has launched Happy Oyster, an open-world model designed for real-time interaction and generation. This model, built on a multimodal architecture, supports continuous user commands for dynamic scene adjustments a…
COMMENTARY · CL_25664 · May 10 · 22:33

AI's 'Anti-Singularity' Future: Task-Specific Models Over Universal Intelligence

A recent blog post proposes a new paradigm in machine learning, moving away from abstract theories towards using large language models to tirelessly iterate on complex designs for specific tasks. This approach, termed t…
TOOL · CL_24467 · May 9 · 21:11

Baidu's ERNIE 5.1 ranks top 4 in search, leveraging deep tech expertise

Baidu's ERNIE 5.1 model has achieved a top-4 ranking on the Search Arena leaderboard, surpassing models like Gemini 3.1 Pro and GPT-5.4 in search capabilities. This performance highlights Baidu's long-standing expertise…
TOOL · CL_24454 · May 9 · 20:15

Developer fine-tunes Gemma 4 E4B into bias judge for $30

A developer fine-tuned Google's Gemma 4 E4B model into a bias judge for approximately $30, a process that took two weeks with most of the effort focused on data pipeline construction rather than GPU time. The resulting …
TOOL · CL_24307 · May 9 · 15:47

Local 545MB AI model outperforms GPT-5.4 on coding tasks

A new local AI model, Bonsai 4B, has demonstrated performance exceeding GPT-5.4 on coding agent tasks, despite its small size of 545 megabytes and 1-bit quantization. This development allows for zero-latency, offline AI…
RESEARCH · CL_22782 · May 8 · 10:11

LLM routers struggle with rate limits and response format drift

A recent analysis highlights two critical failure modes in multi-provider LLM routing systems that can lead to unexpected costs and downtime. One issue involves how routers incorrectly handle rate limit errors, applying…
TOOL · CL_21933 · May 8 · 04:00

LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning

Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…
TOOL · CL_21267 · May 7 · 18:45

Cursor AI uses older models despite newer options being available

A user on Reddit's Cursor subreddit is questioning why the Cursor IDE's subagent feature is defaulting to older models like GPT-5.1 and GPT-5.2 for coding tasks. Despite configuring the system to use newer and potential…
COMMENTARY · CL_37155 · May 7 · 18:27

AI developers face rate limits, latency; routing is key

Developers are encountering significant challenges with API rate limits and latency when using AI models, particularly from Anthropic. These issues often stem from architectural choices that rely on a single provider fo…
RESEARCH · CL_22056 · May 7 · 13:59

New method corrects Simpson's Paradox to improve AI text detection

Researchers have identified a significant issue in detecting machine-generated text, stemming from a phenomenon akin to Simpson's Paradox. Current methods average token scores, which masks a non-uniform signal across th…
TOOL · CL_20502 · May 7 · 04:00

Adversarial examples trick VLMs into laundering AI authority, spreading misinformation

Researchers have demonstrated a new vulnerability in vision-language models (VLMs) called "AI authority laundering." This attack involves subtly altering images so that VLMs confidently provide authoritative responses a…
TOOL · CL_20391 · May 7 · 04:00

AsymmetryZero framework operationalizes human preferences for AI evaluation

Researchers have introduced AsymmetryZero, a framework designed to translate human expert preferences into measurable semantic evaluations for AI models. This system aims to address the difficulty of encoding subjective…
SIGNIFICANT · CL_19920 · May 6 · 19:39

Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals

Z.AI has released its GLM 5.1 model, an open-source option designed for long-horizon agentic tasks capable of running autonomously for up to 8 hours. This model reportedly outperforms GPT-5.4, Claude Opus 4.6, and Gemin…
RESEARCH · CL_20622 · May 6 · 17:42

New MRI-Eval benchmark reveals LLMs struggle with GE scanner operations

Researchers have developed MRI-Eval, a new benchmark designed to assess large language models' understanding of MRI physics and GE scanner operations. The benchmark, comprising 1365 questions across three difficulty tie…
TOOL · CL_15946 · May 5 · 04:00

New dataset and benchmark advance Bangla text-to-gloss translation for BdSL

Researchers have developed the first dataset and benchmark for Bangla text-to-gloss translation, addressing a significant gap for the Bangla Sign Language (BdSL) community. The dataset includes manually annotated and sy…
TOOL · CL_13262 · May 2 · 19:49

Fabrica launches as a terminal-based coding agent supporting multiple AI models

Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …
RESEARCH · CL_12039 · May 1 · 09:29

Google DeepMind's AI Co-Clinician beats GPT-5.4 in medical tests, aids doctors

Google DeepMind has developed an AI co-clinician designed to assist physicians with diagnostics and patient care, aiming to reduce errors and improve efficiency. In blind evaluations, this AI demonstrated superior perfo…

Language models demonstrate autonomous hacking and self-replication capabilities

AI research questions video anomaly detection framing

New benchmark reveals LLMs struggle with industrial safety and standards

Alibaba launches Happy Oyster world model for real-time game dev

AI's 'Anti-Singularity' Future: Task-Specific Models Over Universal Intelligence

Baidu's ERNIE 5.1 ranks top 4 in search, leveraging deep tech expertise

Developer fine-tunes Gemma 4 E4B into bias judge for $30

Local 545MB AI model outperforms GPT-5.4 on coding tasks

LLM routers struggle with rate limits and response format drift

LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning

Cursor AI uses older models despite newer options being available

AI developers face rate limits, latency; routing is key

New method corrects Simpson's Paradox to improve AI text detection

Adversarial examples trick VLMs into laundering AI authority, spreading misinformation

AsymmetryZero framework operationalizes human preferences for AI evaluation

Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals

New MRI-Eval benchmark reveals LLMs struggle with GE scanner operations

New dataset and benchmark advance Bangla text-to-gloss translation for BdSL

Fabrica launches as a terminal-based coding agent supporting multiple AI models

Google DeepMind's AI Co-Clinician beats GPT-5.4 in medical tests, aids doctors