GPT-5.4
PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.
- subsidiary of OpenAI 100%
- developed by OpenAI 100%
- instance of large-language models 90%
- competes with DeepSeek 80%
- competes with MiMo V2.5 Pro 80%
- competes with Claude Opus 4.6 70%
- competes with Gemini 3.1 Pro 70%
- used by arXiv 70%
- used by large-language models 70%
- uses codex 70%
- competes with Kimi K2.6 70%
- competes with Claude Opus 4.7 70%
14 天有情绪数据
-
AI developers face rate limits, latency; routing is key
Developers are encountering significant challenges with API rate limits and latency when using AI models, particularly from Anthropic. These issues often stem from architectural choices that rely on a single provider fo…
-
New method corrects Simpson's Paradox to improve AI text detection
Researchers have identified a significant issue in detecting machine-generated text, stemming from a phenomenon akin to Simpson's Paradox. Current methods average token scores, which masks a non-uniform signal across th…
-
Adversarial examples trick VLMs into laundering AI authority, spreading misinformation
Researchers have demonstrated a new vulnerability in vision-language models (VLMs) called "AI authority laundering." This attack involves subtly altering images so that VLMs confidently provide authoritative responses a…
-
AsymmetryZero framework operationalizes human preferences for AI evaluation
Researchers have introduced AsymmetryZero, a framework designed to translate human expert preferences into measurable semantic evaluations for AI models. This system aims to address the difficulty of encoding subjective…
-
Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals
Z.AI has released its GLM 5.1 model, an open-source option designed for long-horizon agentic tasks capable of running autonomously for up to 8 hours. This model reportedly outperforms GPT-5.4, Claude Opus 4.6, and Gemin…
-
New MRI-Eval benchmark reveals LLMs struggle with GE scanner operations
Researchers have developed MRI-Eval, a new benchmark designed to assess large language models' understanding of MRI physics and GE scanner operations. The benchmark, comprising 1365 questions across three difficulty tie…
-
New dataset and benchmark advance Bangla text-to-gloss translation for BdSL
Researchers have developed the first dataset and benchmark for Bangla text-to-gloss translation, addressing a significant gap for the Bangla Sign Language (BdSL) community. The dataset includes manually annotated and sy…
-
Fabrica launches as a terminal-based coding agent supporting multiple AI models
Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …
-
Google DeepMind's AI Co-Clinician beats GPT-5.4 in medical tests, aids doctors
Google DeepMind has developed an AI co-clinician designed to assist physicians with diagnostics and patient care, aiming to reduce errors and improve efficiency. In blind evaluations, this AI demonstrated superior perfo…
-
GPT-5.4 leads LLMs in new EU digital battery passport conformance task
Researchers have introduced BatteryPass-12K, the first dataset designed for classifying digital battery passport conformance, in anticipation of the EU's upcoming battery regulation. They evaluated 22 language models, f…
-
AI agent swarms may fail due to 'Inverse-Wisdom Law,' study finds
A new paper introduces the Inverse-Wisdom Law, challenging the assumption that AI agent swarms benefit from the "Wisdom of the Crowd." The research demonstrates that these swarms can prioritize internal architectural ag…
-
New VeriGround model achieves reliable circuit-to-Verilog code generation
Researchers have identified a significant reliability issue in multimodal large language models (MLLMs) when generating hardware description language (HDL) code from circuit diagrams. This "Mirage" phenomenon occurs whe…
-
Lingo.dev 发布 v1.0,配备 AI 驱动的本地化引擎
Lingo.dev 推出了其本地化平台的 1.0 版本,引入了检索增强本地化 (RAL)。这种方法将术语表上下文和品牌声音规则注入 LLM 请求中,以提高翻译准确性并防止术语漂移。该平台支持各种 LLM 提供商,并通过 CLI、CI/CD 和 API 提供集成,并提供详细的日志记录以进行质量保证。
-
AWS launches Amazon Quick, integrates OpenAI models into Bedrock
Amazon Web Services has launched Amazon Quick, an AI agent designed to integrate with local files, emails, and applications to streamline workflows. The company also announced a deeper partnership with OpenAI, bringing …
-
Xiaomi open-sources MiMo-V2.5 AI models, showcasing macOS simulation and high token efficiency
Xiaomi has officially open-sourced its MiMo-V2.5 series of AI models, including the flagship MiMo-V2.5 Pro agent model. These models demonstrate strong performance, rivaling top closed-source models like Claude Opus 4.6…
-
小米的MiMo-v2.5-Pro开源模型可与顶级AI编码助手相媲美
小米发布了MiMo-v2.5-Pro,这是一款专注于编码的开源语言模型,在复杂任务中展现出令人印象深刻的能力。该模型在数小时内成功完成了一个大学级别的编译器项目,根据模糊的提示构建了一个功能齐全的视频编辑器应用程序,并解决了模拟电路设计问题。MiMo-v2.5-Pro在编码基准测试中表现强劲,可与GPT-5.4和Claude Opus 4.6等顶级闭源模型相媲美,现已在HuggingFace上发布。
-
Frontier LLMs like GPT-5.4 and Claude Opus 4.7 show significant verbal tics
A new paper analyzes the prevalence of verbal tics, such as repetitive phrases and sycophantic openers, in eight leading large language models. Researchers developed a Verbal Tic Index (VTI) to quantify these tics, find…
-
Claude Opus 4.7 leads frontier agents in AI research acceleration benchmark
A new research paper proposes a benchmark to assess AI's ability to autonomously implement machine learning pipelines, aiming to detect early signs of recursive self-improvement. Frontier coding agents were tasked with …
-
GPT-5.4 and Claude Opus 4.6 fail banking benchmark, scoring 0% client-ready outputs
A new benchmark called BankerToolBench has revealed significant shortcomings in current large language models when applied to financial tasks. GPT-5.4, Claude Opus 4.6, and other models were tested on simulated junior i…
-
DeepSeek releases V4 Pro and Flash models with 1M context, runs on Huawei chips
DeepSeek has released its new V4 family of models, including V4 Pro and V4 Flash, which boast a 1 million token context window. These models were trained on 32 trillion tokens and feature a novel hybrid attention system…