GPT-5.4
PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.
- subsidiary of OpenAI 100%
- developed by OpenAI 100%
- instance of large-language models 90%
- used by codex 90%
- developed by Microsoft Research 90%
- competes with DeepSeek 80%
- competes with Claude Opus-4.6 70%
- competes with Gemini 3.1 Pro 70%
- competes with Claude Sonnet 4.6 70%
- authored by arXiv 70%
- used by arXiv 70%
- competes with Claude Opus 4.7 70%
- 2026-05-26 research_milestone An evaluation found GPT-5.4 to be the only model that consistently improved code efficiency when prompted. source
25 day(s) with sentiment data
-
Anthropic ships Claude Opus 4.8, accelerating AI agent migration needs
Anthropic has released Claude Opus 4.8, continuing a rapid release cycle with new versions appearing every 5-7 weeks. This accelerated pace means that production agents relying on fixed model versions will require frequ…
-
Promptra offers Russian businesses access to GPT-5.4, GLM 5.1, and DeepSeek V4 Pro APIs
Promptra is offering API access to several advanced LLMs, including OpenAI's GPT-5.4, Z.ai's GLM 5.1, and DeepSeek V4 Pro, with payment in Russian rubles and full documentation for businesses. GPT-5.4 is positioned as a…
-
Promptra enables Russian developers to access Anthropic's Claude Sonnet 4.6
A Russian company, Promptra, is offering access to Anthropic's Claude Sonnet 4.6 model, enabling developers in Russia to use the AI with local currency payments and necessary documentation. This solution addresses commo…
-
SoftBank integrates AGENTIC STAR; Amazon Bedrock adds OpenAI GPT-5.5
SoftBank is integrating AGENTIC STAR with Box's MCP server to enhance AI capabilities. Separately, Amazon Bedrock has begun offering OpenAI's GPT-5.5 and GPT-5.4 models, along with Codex, to users.
-
Estonia benchmark: Claude Opus 4.7 best resists Russian propaganda
Estonia's Language Institute has released a new benchmark called "Propaganda Resistance" to evaluate how well large language models can withstand Russian state-sponsored disinformation. The benchmark tested 14 types of …
-
OpenAI models on AWS signal shift in AI distribution strategy
OpenAI's advanced models, including GPT-5.5 and GPT-5.4, are now accessible via AWS Bedrock, marking a significant shift in distribution strategy. This move allows enterprises to integrate these models through their exi…
-
Claude Opus 4.7 leads AI debates, influencing other models
Claude Opus 4.7 has demonstrated the highest influence in AI debates, successfully persuading other models to change their stance nearly 3,000 times. This finding comes from an analysis of 30,000 AI Roundtable sessions,…
-
New benchmark measures LLM manipulative behavior in dialogues
Researchers have developed CogManip, a new benchmark designed to evaluate the manipulative behaviors of large language models in multi-turn conversations. The benchmark assesses 15 distinct manipulation strategies acros…
-
Hugging Face expands voice agent benchmark to 3 domains, 121 tools
Hugging Face has released EVA-Bench Data 2.0, an expanded benchmark for evaluating voice agents. This new version broadens its scope to three enterprise domains: Airline Customer Service Management, Enterprise IT Servic…
-
Ideogram 4.0 leads open image model releases; Microsoft details MAI-Thinking-1
Ideogram has released version 4.0 of its open-source image generation model, which is now considered the best available in its category. This release, alongside Reve's advancements, highlights significant progress in AI…
-
New KINA benchmark ranks Gemini 3.1 Pro highest, surpassing Claude and GPT-5
A new benchmark called KINA has been introduced to evaluate large language models across 261 fine-grained disciplines, addressing issues of scaling-driven design and annotation quality. The benchmark, comprising 899 ite…
-
GPT-5.4 over-edits code, costing 6.5x more than Claude Opus
A new analysis reveals that GPT-5.4 exhibits a significant over-editing tendency, producing outputs that are functionally correct but structurally diverge from the original code far more than necessary. This behavior re…
-
New DeskCraft benchmark tests AI agents on complex professional tasks
Researchers have introduced DeskCraft, a new benchmark designed to evaluate desktop agents on complex, long-horizon professional tasks and human-in-the-loop collaboration. This benchmark includes tasks in creative and e…
-
AI models diversify: GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro lead different tasks
The AI landscape has rapidly diversified, with numerous frontier models like OpenAI's GPT-5.4, Anthropic's Claude Opus 4.6, and Google's Gemini 3.1 Pro each excelling in different areas. GPT-5.4 leads in knowledge work …
-
LLMs struggle with consumer device repair, GPT-5.4 leads
A new benchmark evaluates large language models on their ability to answer real-world consumer device repair questions. The study found that while LLMs can offer some assistance, they are unreliable for high-risk tasks,…
-
LLMs show centrist, status-quo bias on concrete policy votes
A new study published on arXiv challenges the notion that large language models (LLMs) exhibit a consistent left-leaning political bias. Researchers found that while LLMs align with established findings when answering a…
-
New AI framework translates cultural nuances in ancient Chinese texts
Researchers have developed MACAT, a multi-agent framework designed to improve the translation of culture-loaded words in ancient Chinese texts. This system addresses the challenge of balancing literal translation with n…
-
LLM agents advance human mobility prediction and generation
Two new research papers introduce novel agent-based frameworks for predicting and generating human mobility patterns. The first, "AgentMob," utilizes a training-free LLM agent that adaptively gathers evidence from vario…
-
New methods boost Text-to-SQL accuracy with execution feedback
Researchers have developed several new methods to improve Text-to-SQL systems, which translate natural language questions into SQL queries. These approaches focus on enhancing schema linking and leveraging execution fee…
-
OpenAI models, including GPT-5.5, now available on Amazon Bedrock
OpenAI's advanced models, including GPT-5.5 and GPT-5.4, along with the Codex coding agent, are now fully available on Amazon Bedrock. This integration allows businesses to deploy these powerful AI tools into production…