PulseAugur
EN
LIVE 23:53:08
ENTITY GPT-5.4

GPT-5.4

PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
111
111 over 90d
Releases · 30d
1
1 over 90d
Papers · 30d
64
64 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-26 research_milestone An evaluation found GPT-5.4 to be the only model that consistently improved code efficiency when prompted. source
SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 1/6 · 111 TOTAL
  1. SIGNIFICANT · CL_80391 ·

    Z.ai's GLM-5.1 tops coding benchmark as open-weight model

    Z.ai has released GLM-5.1, a 744B parameter Mixture-of-Experts model that achieved a score of 58.4% on the SWE-Bench Pro leaderboard in April 2026. This marks the first open-weight model to surpass leading proprietary m…

  2. TOOL · CL_79932 ·

    AI agents use executable world models to solve ARC-AGI-3 benchmark

    A new research paper introduces an executable world model approach for AI agents tackling the ARC-AGI-3 benchmark. This system uses Python to maintain and verify a world model, refactoring it for simplicity and planning…

  3. RESEARCH · CL_79551 ·

    AI generates Traditional Chinese IEPs, outperforming GPT-5.4

    Researchers have developed a novel method for automatically generating Individualized Education Programs (IEPs) in Traditional Chinese, addressing a significant gap in special-education NLP. The proposed Corpus-Grounded…

  4. TOOL · CL_78228 ·

    OpenAI releases Python SDK for Codex agent

    OpenAI has released an official Python SDK for its Codex agent, simplifying its integration into Python-based applications. Previously, developers had to rely on shell commands or a TypeScript SDK, which was inconvenien…

  5. COMMENTARY · CL_76975 ·

    AI agents wreck finance workflows via shared context, not model limits

    An analysis of financial automation workflows highlights that using a single, always-on AI agent across personal, rental, and business accounts leads to dangerous "confident nonsense." The core issue is not the AI model…

  6. TOOL · CL_75589 ·

    AI cost tracking shifts to per-request attribution for better financial oversight

    Developers are increasingly focused on tracking the precise cost of AI model usage, moving beyond simple monthly invoices to per-request attribution. This granular approach allows teams to understand which specific feat…

  7. TOOL · CL_75512 ·

    New GCF format outperforms JSON and TOON in LLM data handling benchmark

    A new benchmark reveals that common data formats like JSON and TOON struggle with large language models, failing to maintain accuracy and validity at scale. The study found that JSON breaks down with as few as 500 recor…

  8. RESEARCH · CL_75515 ·

    Anthropic ships Claude Opus 4.8, accelerating AI agent migration needs

    Anthropic has released Claude Opus 4.8, continuing a rapid release cycle with new versions appearing every 5-7 weeks. This accelerated pace means that production agents relying on fixed model versions will require frequ…

  9. TOOL · CL_75012 ·

    Promptra offers Russian businesses access to GPT-5.4, GLM 5.1, and DeepSeek V4 Pro APIs

    Promptra is offering API access to several advanced LLMs, including OpenAI's GPT-5.4, Z.ai's GLM 5.1, and DeepSeek V4 Pro, with payment in Russian rubles and full documentation for businesses. GPT-5.4 is positioned as a…

  10. TOOL · CL_74640 ·

    Promptra enables Russian developers to access Anthropic's Claude Sonnet 4.6

    A Russian company, Promptra, is offering access to Anthropic's Claude Sonnet 4.6 model, enabling developers in Russia to use the AI with local currency payments and necessary documentation. This solution addresses commo…

  11. SIGNIFICANT · CL_72875 ·

    SoftBank integrates AGENTIC STAR; Amazon Bedrock adds OpenAI GPT-5.5

    SoftBank is integrating AGENTIC STAR with Box's MCP server to enhance AI capabilities. Separately, Amazon Bedrock has begun offering OpenAI's GPT-5.5 and GPT-5.4 models, along with Codex, to users.

  12. TOOL · CL_72287 ·

    Estonia benchmark: Claude Opus 4.7 best resists Russian propaganda

    Estonia's Language Institute has released a new benchmark called "Propaganda Resistance" to evaluate how well large language models can withstand Russian state-sponsored disinformation. The benchmark tested 14 types of …

  13. RESEARCH · CL_72005 ·

    OpenAI models on AWS signal shift in AI distribution strategy

    OpenAI's advanced models, including GPT-5.5 and GPT-5.4, are now accessible via AWS Bedrock, marking a significant shift in distribution strategy. This move allows enterprises to integrate these models through their exi…

  14. TOOL · CL_71271 ·

    Claude Opus 4.7 leads AI debates, influencing other models

    Claude Opus 4.7 has demonstrated the highest influence in AI debates, successfully persuading other models to change their stance nearly 3,000 times. This finding comes from an analysis of 30,000 AI Roundtable sessions,…

  15. RESEARCH · CL_72520 ·

    New benchmark measures LLM manipulative behavior in dialogues

    Researchers have developed CogManip, a new benchmark designed to evaluate the manipulative behaviors of large language models in multi-turn conversations. The benchmark assesses 15 distinct manipulation strategies acros…

  16. RESEARCH · CL_71082 ·

    Hugging Face expands voice agent benchmark to 3 domains, 121 tools

    Hugging Face has released EVA-Bench Data 2.0, an expanded benchmark for evaluating voice agents. This new version broadens its scope to three enterprise domains: Airline Customer Service Management, Enterprise IT Servic…

  17. SIGNIFICANT · CL_70061 ·

    Ideogram 4.0 leads open image model releases; Microsoft details MAI-Thinking-1

    Ideogram has released version 4.0 of its open-source image generation model, which is now considered the best available in its category. This release, alongside Reve's advancements, highlights significant progress in AI…

  18. RESEARCH · CL_70254 ·

    New KINA benchmark ranks Gemini 3.1 Pro highest, surpassing Claude and GPT-5

    A new benchmark called KINA has been introduced to evaluate large language models across 261 fine-grained disciplines, addressing issues of scaling-driven design and annotation quality. The benchmark, comprising 899 ite…

  19. COMMENTARY · CL_69022 ·

    GPT-5.4 over-edits code, costing 6.5x more than Claude Opus

    A new analysis reveals that GPT-5.4 exhibits a significant over-editing tendency, producing outputs that are functionally correct but structurally diverge from the original code far more than necessary. This behavior re…

  20. TOOL · CL_68272 ·

    New DeskCraft benchmark tests AI agents on complex professional tasks

    Researchers have introduced DeskCraft, a new benchmark designed to evaluate desktop agents on complex, long-horizon professional tasks and human-in-the-loop collaboration. This benchmark includes tasks in creative and e…