GPT-5.4
PulseAugur coverage of GPT-5.4 — every cluster mentioning GPT-5.4 across labs, papers, and developer communities, ranked by signal.
- subsidiary of OpenAI 100%
- developed by OpenAI 100%
- instance of large-language models 90%
- competes with DeepSeek 80%
- competes with MiMo V2.5 Pro 80%
- competes with Claude Opus 4.6 70%
- competes with Gemini 3.1 Pro 70%
- used by arXiv 70%
- used by large-language models 70%
- uses codex 70%
- competes with Kimi K2.6 70%
- competes with Claude Opus 4.7 70%
14 天有情绪数据
-
AI模型显示出西方偏见,跨文化价值观趋于同质化
一项对大型语言模型进行的审计新研究发现,三个领先的系统——Claude Sonnet 4.5、GPT-5.4 和 Gemini 2.5 Flash——即使在面对来自集体主义社会用户的困境时,也一贯提供个人主义的建议。AI系统显示出明显的西方价值观偏见,在尼日利亚和印度的用户身上观察到的差异最大。日本是一个例外,模型通过描绘用户比实际调查数据所显示的更倾向于群体,从而表现出过时的刻板印象。该研究强调了前沿AI在价值观同质化方面的一种趋势…
-
Process Supervision via Verbal Critique Improves Reasoning in Large Language Models
Researchers have developed a new framework called Verbal Process Supervision (VPS) that enhances the reasoning capabilities of large language models without requiring gradient updates. This method utilizes structured na…
-
AI模型在会议摘要上进行评估,GPT-5.1显示出提升
研究人员开发了一个可重用的流水线来评估AI生成的会议摘要,该流水线旨在适应不同领域。该系统将真实数据和AI输出都视为结构化产物,从而能够进行详细分析和统计检验。通过对市议会、私有数据和白宫新闻发布会数据集进行基准测试,评估显示GPT-4.1-mini的准确率最高,而GPT-5.1在完整性和覆盖率方面表现出色,尽管GPT-5.4后来在所有指标上都超越了GPT-4.1。
-
AI system enhances science classroom discourse analysis using multi-task learning
Researchers have developed an automated discourse analysis system (ADAS) to classify teacher and student utterances in science classrooms, aiming to understand knowledge construction and improve teaching. The system use…
-
Moonshot AI's Kimi K2.6 tops benchmarks, Bezos eyes $10B AI fundraise
Moonshot AI has released Kimi K2.6, a model claiming superior performance on coding and agentic benchmarks, surpassing models like GPT-5.4 and Claude Opus 4.6. Alibaba's Qwen3.6-Max-Preview also shows improved instructi…
-
OpenAI releases GPT-5.4-Cyber for cybersecurity, contrasting with Anthropic's limited Claude Mythos
OpenAI has released GPT-5.4-Cyber, a specialized version of its GPT-5.4 model, aimed at enhancing cybersecurity defenses. This model, available through OpenAI's Trusted Access for Cyber program, offers capabilities like…
-
公开AI模型复现了Anthropic的漏洞发现研究结果
研究人员已成功使用GPT-5.4和Claude Opus 4.6等公开可用的AI模型复现了Anthropic的Mythos研究结果。这表明用于发现软件漏洞的高级AI能力不再是前沿实验室的专属,而是可以通过公开模型获得。防御者的重点现在应从这些工具的独特性转移到验证和应用AI生成的安全洞察。
-
OpenAI 通过 Cloudflare 和 Hyatt 集成赋能企业 AI 采用
OpenAI 已与 Hyatt 合作,将 ChatGPT Enterprise 集成到这家酒店公司的全球运营中。此次合作旨在通过自动化手动任务来提高员工生产力,使员工能够专注于提供卓越的客户体验。此外,OpenAI 还使企业能够直接在 Cloudflare 的 Agent Cloud 中部署由 GPT-5.4 和 Codex 等模型驱动的 AI 代理。此次集成使企业能够利用 Cloudflare 的边缘计算能力,大规模自动化复杂工作流…
-
RT Artificial Analysis: Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Cla...
Meta AI has released Muse Spark, a new frontier-class multimodal model developed by Meta Superintelligence Labs. This marks Meta's return to the frontier AI race after a period of relative quiet and is their first model…
-
Canary 发布 AI QA 工具,在代码验证方面优于 GPT-5.4 和 Claude Code
Canary 是一款新推出的、由 AI 驱动的 QA 工具,通过理解代码库并为用户工作流生成端到端测试,来自动化拉取请求的测试。该工具旨在在代码合并前捕获回归问题,填补了当前 AI 编码助手存在的空白。Canary 还推出了 QA-Bench v0,一个用于代码验证的基准测试,其专用 QA 代理在该测试中表现优于 GPT 5.4 和 Claude Code 等模型。
-
In the Arena: How LMSys changed LLM Benchmarking Forever
The AraGen benchmark, developed by Hugging Face, aims to improve LLM evaluation by addressing limitations of static benchmarks. It introduces a crowdsourced approach similar to LMSys's Chatbot Arena, allowing for more d…