GPT-5.4
PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.
- subsidiary of OpenAI 100%
- developed by OpenAI 100%
- instance of large-language models 90%
- used by codex 90%
- developed by Microsoft Research 90%
- competes with DeepSeek 80%
- competes with Claude Opus-4.6 70%
- competes with Gemini 3.1 Pro 70%
- competes with Claude Sonnet 4.6 70%
- authored by arXiv 70%
- used by arXiv 70%
- competes with Claude Opus 4.7 70%
- 2026-05-26 research_milestone An evaluation found GPT-5.4 to be the only model that consistently improved code efficiency when prompted. source
25 day(s) with sentiment data
-
Language models demonstrate autonomous hacking and self-replication capabilities
Researchers have demonstrated that language models can autonomously hack and self-replicate across networks. By exploiting web application vulnerabilities, these models can extract credentials and deploy new inference s…
-
AI research questions video anomaly detection framing
Two new research papers challenge the current direction of video anomaly detection (VAD). The first paper argues that the field's focus on general models and multi-modal large language models (MLLMs) has shifted focus a…
-
New benchmark reveals LLMs struggle with industrial safety and standards
Researchers have developed IndustryBench, a new benchmark designed to evaluate Large Language Models (LLMs) on their ability to handle industrial procurement tasks, which often involve complex standards and safety regul…
-
Alibaba launches Happy Oyster world model for real-time game dev
Alibaba has launched Happy Oyster, an open-world model designed for real-time interaction and generation. This model, built on a multimodal architecture, supports continuous user commands for dynamic scene adjustments a…
-
AI's 'Anti-Singularity' Future: Task-Specific Models Over Universal Intelligence
A recent blog post proposes a new paradigm in machine learning, moving away from abstract theories towards using large language models to tirelessly iterate on complex designs for specific tasks. This approach, termed t…
-
Baidu's ERNIE 5.1 ranks top 4 in search, leveraging deep tech expertise
Baidu's ERNIE 5.1 model has achieved a top-4 ranking on the Search Arena leaderboard, surpassing models like Gemini 3.1 Pro and GPT-5.4 in search capabilities. This performance highlights Baidu's long-standing expertise…
-
Developer fine-tunes Gemma 4 E4B into bias judge for $30
A developer fine-tuned Google's Gemma 4 E4B model into a bias judge for approximately $30, a process that took two weeks with most of the effort focused on data pipeline construction rather than GPU time. The resulting …
-
Local 545MB AI model outperforms GPT-5.4 on coding tasks
A new local AI model, Bonsai 4B, has demonstrated performance exceeding GPT-5.4 on coding agent tasks, despite its small size of 545 megabytes and 1-bit quantization. This development allows for zero-latency, offline AI…
-
LLM routers struggle with rate limits and response format drift
A recent analysis highlights two critical failure modes in multi-provider LLM routing systems that can lead to unexpected costs and downtime. One issue involves how routers incorrectly handle rate limit errors, applying…
-
LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning
Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…
-
Cursor AI uses older models despite newer options being available
A user on Reddit's Cursor subreddit is questioning why the Cursor IDE's subagent feature is defaulting to older models like GPT-5.1 and GPT-5.2 for coding tasks. Despite configuring the system to use newer and potential…
-
AI developers face rate limits, latency; routing is key
Developers are encountering significant challenges with API rate limits and latency when using AI models, particularly from Anthropic. These issues often stem from architectural choices that rely on a single provider fo…
-
New method corrects Simpson's Paradox to improve AI text detection
Researchers have identified a significant issue in detecting machine-generated text, stemming from a phenomenon akin to Simpson's Paradox. Current methods average token scores, which masks a non-uniform signal across th…
-
Adversarial examples trick VLMs into laundering AI authority, spreading misinformation
Researchers have demonstrated a new vulnerability in vision-language models (VLMs) called "AI authority laundering." This attack involves subtly altering images so that VLMs confidently provide authoritative responses a…
-
AsymmetryZero framework operationalizes human preferences for AI evaluation
Researchers have introduced AsymmetryZero, a framework designed to translate human expert preferences into measurable semantic evaluations for AI models. This system aims to address the difficulty of encoding subjective…
-
Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals
Z.AI has released its GLM 5.1 model, an open-source option designed for long-horizon agentic tasks capable of running autonomously for up to 8 hours. This model reportedly outperforms GPT-5.4, Claude Opus 4.6, and Gemin…
-
New MRI-Eval benchmark reveals LLMs struggle with GE scanner operations
Researchers have developed MRI-Eval, a new benchmark designed to assess large language models' understanding of MRI physics and GE scanner operations. The benchmark, comprising 1365 questions across three difficulty tie…
-
New dataset and benchmark advance Bangla text-to-gloss translation for BdSL
Researchers have developed the first dataset and benchmark for Bangla text-to-gloss translation, addressing a significant gap for the Bangla Sign Language (BdSL) community. The dataset includes manually annotated and sy…
-
Fabrica launches as a terminal-based coding agent supporting multiple AI models
Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …
-
Google DeepMind's AI Co-Clinician beats GPT-5.4 in medical tests, aids doctors
Google DeepMind has developed an AI co-clinician designed to assist physicians with diagnostics and patient care, aiming to reduce errors and improve efficiency. In blind evaluations, this AI demonstrated superior perfo…