GPT-5.4
PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.
- subsidiary of OpenAI 100%
- developed by OpenAI 100%
- instance of large-language models 90%
- competes with DeepSeek 80%
- competes with MiMo V2.5 Pro 80%
- competes with Claude Opus 4.6 70%
- competes with Gemini 3.1 Pro 70%
- used by arXiv 70%
- used by large-language models 70%
- uses codex 70%
- competes with Kimi K2.6 70%
- competes with Claude Opus 4.7 70%
13 天有情绪数据
-
AI system generates formally verified distributed systems
Researchers have developed Inductive Deductive Synthesis (IDS), a new AI system capable of generating formally verified distributed systems. Unlike previous AI coding agents that struggle with formal guarantees, IDS syn…
-
2026 LLM Landscape: No Single Best, Task Routing is Key
In 2026, the AI landscape features over 500 models, with no single "best" LLM available. Instead, users are advised to route tasks to specialized models for optimal results. For instance, ChatGPT (GPT-5.4) excels as an …
-
Microsoft Research's Webwright boosts AI web agent performance
Microsoft Research has developed Webwright, an open-source framework that allows AI agents to interact with the web using a terminal-based approach. Unlike traditional agents that act one step at a time in a browser, We…
-
Cursor AI coding assistant surprises with efficient Kimi-based Composer model
A Reddit user expressed surprise at the improved performance of the Cursor AI coding assistant, noting that its Composer model, based on Kimi, significantly outperforms expectations. The user found Composer to be far mo…
-
Microsoft launches Fara1.5 agents that outperform OpenAI and Google
Microsoft Research has introduced Fara1.5, a series of three browser computer-use agent models (4B, 9B, and 27B parameters) built upon Qwen3.5. These agents are designed to interact with real browsers by interpreting sc…
-
Frontier LLMs fall short in cybersecurity tasks, study finds
A new research paper evaluates the readiness of frontier large language models for cybersecurity tasks, finding that general-purpose models struggle with both vulnerability detection and security testing. The study test…
-
HealthCraft environment tests AI safety in emergency medicine
Researchers have developed HealthCraft, a novel reinforcement learning environment designed to evaluate the safety of AI models in emergency medicine scenarios. This environment simulates realistic clinical conditions a…
-
DivSkill-SQL boosts Text-to-SQL ensembles with complementary agent training
Researchers have developed DivSkill-SQL, a novel framework for enhancing Text-to-SQL ensembles. This method optimizes complementary skills by training new agents on examples that the existing ensemble fails on, thereby …
-
Alibaba's Qwen 3.6 open-weight model rivals frontier AI on coding tasks
Alibaba's Qwen 3.6 model family, particularly the 27B dense variant, has demonstrated performance competitive with leading frontier models like GPT-5.4 and Claude 4.6 on coding tasks. This open-weight model, runnable on…
-
AI models fail to reliably forecast scientific progress, study finds
A new benchmark called CUSP has been developed to evaluate AI's ability to forecast scientific progress. The study found that current frontier AI models struggle with predicting the realization and timing of scientific …
-
Microsoft Security Copilot uses AI agent for autonomous threat detection
Microsoft has developed a Dynamic Threat Detection Agent (DTDA) integrated into its Security Copilot, designed to autonomously investigate security incidents and generate novel alerts. This agent utilizes a unified acti…
-
New attack method enhances adversarial transferability in MLLMs
Researchers have developed FRA-Attack, a novel method to improve the transferability of adversarial attacks against multimodal large language models (MLLMs). This technique utilizes frequency-domain regularization to al…
-
Developer finds Claude Code Extension optimal for AI-assisted coding
A software developer details their journey to find the optimal AI coding assistant, ultimately settling on VS Code with the Claude Code Extension and a MAX plan. They found that while tools like GitHub Copilot and Curso…
-
LLMs struggle to simulate real human behavior, new research shows
Two new research papers explore the limitations of current large language models in simulating realistic human behavior. The first paper, "OmniBehavior," introduces a benchmark using real-world data and finds that LLMs …
-
Databricks launches beta Unity AI Gateway Guardrails for AI security
Databricks has launched a beta version of its Unity AI Gateway Guardrails, designed to enhance the security and compliance of AI applications. These guardrails help prevent sensitive data leakage, protect against malici…
-
LLMs generate gendered behaviors, impacting trust calibration in agents
Researchers have developed a method to generate multimodal behaviors for socially interactive agents, aiming to calibrate user trust based on an agent's capabilities and benevolence. The study utilized GPT-5.4 to produc…
-
Alibaba Qwen 3.7 previews top Chinese models in text and vision benchmarks
Alibaba's Qwen team has released preview versions of its Qwen 3.7 Max and Qwen 3.7 Plus models, showcasing rapid iteration cycles. The Qwen 3.7 Max model has achieved top rankings among Chinese models in text-based benc…
-
AI agents struggle with research rigor despite generating papers
A new study published on arXiv introduces ResearchArena, a framework designed to evaluate the capabilities of AI agents in conducting research autonomously. The system allowed agents like Claude Code, Codex, and Kimi Co…
-
Cursor launches Composer 2.5 AI coding assistant with enhanced intelligence
Cursor has released Composer 2.5, an updated AI coding assistant that offers improved intelligence and reliability for long-running tasks. This new version is built upon Moonshot AI's Kimi K2.5 architecture and incorpor…
-
AI systems take top spots in EgoVis 2026 challenges
Two research teams have presented technical reports for challenges at the EgoVis 2026 conference. One team, JFAA, secured first place in the EPIC-KITCHENS-100 Action Anticipation Challenge using a JEPA-based method for …