PulseAugur
EN
LIVE 16:55:29
ENTITY GPT-5.4

GPT-5.4

PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
125
125 over 90d
Releases · 30d
1
1 over 90d
Papers · 30d
70
70 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-26 research_milestone An evaluation found GPT-5.4 to be the only model that consistently improved code efficiency when prompted. source
SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 5/7 · 125 TOTAL
  1. TOOL · CL_27001 ·

    Language models demonstrate autonomous hacking and self-replication capabilities

    Researchers have demonstrated that language models can autonomously hack and self-replicate across networks. By exploiting web application vulnerabilities, these models can extract credentials and deploy new inference s…

  2. RESEARCH · CL_27982 ·

    AI research questions video anomaly detection framing

    Two new research papers challenge the current direction of video anomaly detection (VAD). The first paper argues that the field's focus on general models and multi-modal large language models (MLLMs) has shifted focus a…

  3. TOOL · CL_27492 ·

    New benchmark reveals LLMs struggle with industrial safety and standards

    Researchers have developed IndustryBench, a new benchmark designed to evaluate Large Language Models (LLMs) on their ability to handle industrial procurement tasks, which often involve complex standards and safety regul…

  4. RESEARCH · CL_26040 ·

    Alibaba launches Happy Oyster world model for real-time game dev

    Alibaba has launched Happy Oyster, an open-world model designed for real-time interaction and generation. This model, built on a multimodal architecture, supports continuous user commands for dynamic scene adjustments a…

  5. COMMENTARY · CL_25664 ·

    AI's 'Anti-Singularity' Future: Task-Specific Models Over Universal Intelligence

    A recent blog post proposes a new paradigm in machine learning, moving away from abstract theories towards using large language models to tirelessly iterate on complex designs for specific tasks. This approach, termed t…

  6. TOOL · CL_24467 ·

    Baidu's ERNIE 5.1 ranks top 4 in search, leveraging deep tech expertise

    Baidu's ERNIE 5.1 model has achieved a top-4 ranking on the Search Arena leaderboard, surpassing models like Gemini 3.1 Pro and GPT-5.4 in search capabilities. This performance highlights Baidu's long-standing expertise…

  7. TOOL · CL_24454 ·

    Developer fine-tunes Gemma 4 E4B into bias judge for $30

    A developer fine-tuned Google's Gemma 4 E4B model into a bias judge for approximately $30, a process that took two weeks with most of the effort focused on data pipeline construction rather than GPU time. The resulting …

  8. TOOL · CL_24307 ·

    Local 545MB AI model outperforms GPT-5.4 on coding tasks

    A new local AI model, Bonsai 4B, has demonstrated performance exceeding GPT-5.4 on coding agent tasks, despite its small size of 545 megabytes and 1-bit quantization. This development allows for zero-latency, offline AI…

  9. RESEARCH · CL_22782 ·

    LLM routers struggle with rate limits and response format drift

    A recent analysis highlights two critical failure modes in multi-provider LLM routing systems that can lead to unexpected costs and downtime. One issue involves how routers incorrectly handle rate limit errors, applying…

  10. TOOL · CL_21933 ·

    LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning

    Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…

  11. TOOL · CL_21267 ·

    Cursor AI uses older models despite newer options being available

    A user on Reddit's Cursor subreddit is questioning why the Cursor IDE's subagent feature is defaulting to older models like GPT-5.1 and GPT-5.2 for coding tasks. Despite configuring the system to use newer and potential…

  12. COMMENTARY · CL_37155 ·

    AI developers face rate limits, latency; routing is key

    Developers are encountering significant challenges with API rate limits and latency when using AI models, particularly from Anthropic. These issues often stem from architectural choices that rely on a single provider fo…

  13. RESEARCH · CL_22056 ·

    New method corrects Simpson's Paradox to improve AI text detection

    Researchers have identified a significant issue in detecting machine-generated text, stemming from a phenomenon akin to Simpson's Paradox. Current methods average token scores, which masks a non-uniform signal across th…

  14. TOOL · CL_20502 ·

    Adversarial examples trick VLMs into laundering AI authority, spreading misinformation

    Researchers have demonstrated a new vulnerability in vision-language models (VLMs) called "AI authority laundering." This attack involves subtly altering images so that VLMs confidently provide authoritative responses a…

  15. TOOL · CL_20391 ·

    AsymmetryZero framework operationalizes human preferences for AI evaluation

    Researchers have introduced AsymmetryZero, a framework designed to translate human expert preferences into measurable semantic evaluations for AI models. This system aims to address the difficulty of encoding subjective…

  16. SIGNIFICANT · CL_19920 ·

    Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals

    Z.AI has released its GLM 5.1 model, an open-source option designed for long-horizon agentic tasks capable of running autonomously for up to 8 hours. This model reportedly outperforms GPT-5.4, Claude Opus 4.6, and Gemin…

  17. RESEARCH · CL_20622 ·

    New MRI-Eval benchmark reveals LLMs struggle with GE scanner operations

    Researchers have developed MRI-Eval, a new benchmark designed to assess large language models' understanding of MRI physics and GE scanner operations. The benchmark, comprising 1365 questions across three difficulty tie…

  18. TOOL · CL_15946 ·

    New dataset and benchmark advance Bangla text-to-gloss translation for BdSL

    Researchers have developed the first dataset and benchmark for Bangla text-to-gloss translation, addressing a significant gap for the Bangla Sign Language (BdSL) community. The dataset includes manually annotated and sy…

  19. TOOL · CL_13262 ·

    Fabrica launches as a terminal-based coding agent supporting multiple AI models

    Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …

  20. RESEARCH · CL_12039 ·

    Google DeepMind's AI Co-Clinician beats GPT-5.4 in medical tests, aids doctors

    Google DeepMind has developed an AI co-clinician designed to assist physicians with diagnostics and patient care, aiming to reduce errors and improve efficiency. In blind evaluations, this AI demonstrated superior perfo…