GPT-5.3-Codex
PulseAugur coverage of GPT-5.3-Codex — every cluster mentioning GPT-5.3-Codex across labs, papers, and developer communities, ranked by signal.
- 2026-02-06 product_launch OpenAI released GPT 5.3 Codex, a new flagship AI model.
1 天有情绪数据
-
新的LivePI基准测试揭示了AI代理程序在提示注入方面的漏洞
研究人员开发了LivePI,这是一个新的基准测试,旨在更真实地评估AI代理程序在间接提示注入方面的风险。该基准测试模拟了电子邮件、网页和聊天等各种输入渠道的真实场景,评估了十二种攻击家族和五种恶意目标。对GPT-5.3-Codex和Claude Opus 4.6等领先模型的初步测试显示出显著的漏洞,群聊注入被证明是普遍成功的,而存储库链接攻击导致了高严重性故障。提出的两层防御措施,结合了提示过滤和工具调用授权,在不影响代理程序效用的情…
-
Cursor AI uses older models despite newer options being available
A user on Reddit's Cursor subreddit is questioning why the Cursor IDE's subagent feature is defaulting to older models like GPT-5.1 and GPT-5.2 for coding tasks. Despite configuring the system to use newer and potential…
-
Fabrica launches as a terminal-based coding agent supporting multiple AI models
Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …
-
Grok 4.2 outperforms GPT-5.3 in math tests, claims top spot in writing
In a surprising turn of events in the AI landscape, Grok 4.2 has demonstrated significant capabilities, achieving a 70.4% success rate on mathematical tests. This performance reportedly surpasses that of GPT-5.3, markin…
-
大型语言模型难以复现物理实验结果,数值模拟能力欠佳
北京大学的一项新预印本评估了大型语言模型复现物理实验论文数值结果的能力。研究人员发现,包括由GPT-5.3驱动的OpenAI Codex在内的所有测试大型语言模型,端到端回调率均为0%,这意味着它们无法复现任何完整的数值结果。尽管模型展示了对论文方法的深刻理解,但在数据分析和数值模拟方面却持续出错,导致最终结果不正确。研究确定了多种失败模式,例如公式实现错误和复杂物理模型过度简化。
-
AI tools offer mixed results for personal life strategy advice
An experiment evaluated eight AI tools, including commercial life-coaching platforms and large language models like GPT-5.3 and Claude Sonnet 4.6, to assess their ability to provide life strategy advice. The user sought…
-
AI编码代理日趋成熟,引发生产力恐慌和新工具
AI开发格局已发生巨变,编码代理现在能够执行持续的、长周期的任务,这是Andrej Karpathy自2025年12月以来注意到的变化。这催生了Perplexity Computer等新产品,一个以编排为先的代理系统,以及OpenAI的GPT-5.3-Codex和GitHub Copilot CLI等工具的进步。然而,这种快速进展也加剧了高管和风险投资者的“生产力恐慌”和一种“AI精神病”,他们正在大力投资可能无法产生可衡量价值的代理…