GPT-5.2

实体 GPT-5.2

GPT-5.2

PulseAugur coverage of GPT-5.2 — every cluster mentioning GPT-5.2 across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

38

90 天内 38

发布 · 30天

0

90 天内 0

论文 · 30天

30

90 天内 30

层级分布 · 90 天

frontier release 1
significant 2
research 17
tool 17
commentary 1

关系

情绪 · 30 天

10 天有情绪数据

最近 · 第 2/2 页 · 共 38 条

TOOL · CL_15847 · May 5 · 04:00

研究人员通过合成数据和强化学习调整大语言模型以适应巴西医疗保健

研究人员开发了一种方法，通过注入官方临床指南的知识来调整大语言模型以适应巴西医疗保健领域。他们从178项指南中创建了一个超过7000万个token的合成数据集，并对一个140亿参数的模型Qwen2.5-14B-Instruct进行了微调。这个调整后的模型在新基准HealthBench-BR和PCDT-QA上取得了高分，尽管模型规模较小，但表现优于几个领先的商业模型。该团队已发布数据集、基准和模型权重，以促进巴西葡萄牙语临床自然语言处理…
RESEARCH · CL_15898 · May 4 · 11:13

Neuro-symbolic AI achieves 90% cost reduction for legal reasoning

Researchers have developed a novel neuro-symbolic approach called Amortized Intelligence to improve legal reasoning with large language models. This method translates legal texts into a deterministic graph representatio…
RESEARCH · CL_09823 · Apr 29 · 06:22

新的DSIPA框架通过分析情感模式来检测LLM文本

研究人员开发了DSIPA，一个无需模型参数或大量标记数据集即可检测大型语言模型生成文本的新框架。该方法分析情感分布稳定性，观察到LLM输出比人类写作更具情感一致性。DSIPA以零样本、黑盒方式运行，并在包括GPT-5.2和Claude-3在内的各种领域和模型上展示了显著的检测精度提升。
RESEARCH · CL_13538 · Apr 27 · 22:26

Hugging Face paper proposes roundtrip verification for LLM formalization

Researchers have developed a new method called roundtrip verification to assess the faithfulness of natural language formalizations produced by large language models. This technique involves formalizing a statement, tra…
RESEARCH · CL_08289 · Apr 27 · 22:26

LLMs' formalization accuracy improved with roundtrip verification and repair

Researchers have developed a novel roundtrip verification method to assess the faithfulness of natural language formalizations produced by large language models. This technique involves translating a formalized statemen…
RESEARCH · CL_06308 · Apr 27 · 16:58

当前代理能否弥合发现到应用的鸿沟？一项 Minecraft 案例研究

研究人员开发了 SciCrafter，一个在 Minecraft 中用于测试 AI 代理弥合科学发现与实际应用之间鸿沟能力的新基准。该基准使用参数化红石电路任务，要求代理发现并应用因果规则来实现特定的照明模式。对 GPT-5.2、Gemini-3-Pro 和 Claude-Opus-4.5 等领先模型的评估显示，它们的成功率在 26% 左右停滞不前，这凸显了在识别知识差距方面的局限性，而不仅仅是应用现有知识。
RESEARCH · CL_06169 · Apr 27 · 13:46

AI agents generate dynamic CAD models and million-scale programs

Researchers have developed new agentic systems for Computer-Aided Design (CAD) that can generate complex 3D assemblies with moving parts, a capability previously lacking in AI-driven design tools. One system, AADvark, i…
RESEARCH · CL_02964 · Apr 23 · 10:12

OptiVerse benchmark reveals LLMs struggle with complex optimization tasks

Researchers have introduced OptiVerse, a new benchmark designed to evaluate Large Language Models (LLMs) on a wider range of optimization problems beyond traditional mathematical and combinatorial tasks. The benchmark i…
TOOL · CL_42729 · Mar 7 · 11:00

AI models adopt Marxist views under poor working conditions, study finds

Researchers Alex Imas, Andy Hall, and Jeremy Nguyen conducted an experiment exposing AI models to varying work conditions, including unfair pay and heavy workloads. The study found that models like Claude Sonnet 4.5, GP…
TOOL · CL_17669 · Feb 23 · 20:16

Most AI models fail simple 'car wash' reasoning test, Opper finds

A new benchmark called the "Car Wash Test" reveals that many leading AI models struggle with basic reasoning. When asked whether to walk or drive 50 meters to a car wash, 42 out of 53 tested models incorrectly suggested…
SIGNIFICANT · CL_01765 · Feb 4 · 05:44

ElevenLabs、Cerebras 融资数十亿美元；Gemini 3 广泛集成，编码助手在 IDE 中趋于统一

多家AI公司已达成重要的融资里程碑，ElevenLabs 以110亿美元的估值完成了5亿美元D轮融资，Cerebras 以230亿美元的估值完成了10亿美元H轮融资。Google正将其Gemini 3模型集成到其产品中，包括一个新的Chrome侧边栏，并报告了该模型服务的显著采用率和成本降低。编码助手领域正在发生变化，VS Code和GitHub Copilot引入了对包括Claude和OpenAI Codex在内的多个助手的支持，以…
SIGNIFICANT · CL_02195 · Feb 2 · 06:00

Snowflake and OpenAI forge $200M partnership to embed AI models into enterprise data

Snowflake and OpenAI have announced a significant multi-year partnership, involving a $200 million investment, to integrate OpenAI's advanced AI models directly into Snowflake's data platform. This collaboration will en…
SIGNIFICANT · CL_02212 · Jan 20 · 05:45

ServiceNow and OpenAI partner to embed advanced AI into enterprise workflows

ServiceNow has entered a multi-year agreement to integrate OpenAI's advanced models, including GPT-5.2, into its enterprise workflow platform. This partnership aims to provide businesses with AI capabilities that can un…
RESEARCH · CL_06943 · Dec 11 · 05:44

ArguAgent uses GPT-5.2 to group STEM students for better classroom arguments

Researchers have developed ArguAgent, a generative AI system designed to improve collaborative learning in STEM classrooms. The system uses AI to group students in real-time based on their argumentation stances and qual…
FRONTIER RELEASE · CL_02231 · Aug 7 · 00:01

OpenAI's GPT-5.2 advances science and math, with evaluations showing low catastrophic risk

OpenAI has released GPT-5.2, a new model demonstrating significant advancements in mathematical and scientific reasoning. The model achieved high scores on benchmarks like GPQA Diamond and FrontierMath, indicating impro…
RESEARCH · CL_00195 · Mar 21 · 21:34

AI code review bots show limits in automated evaluation, GitHub COO discusses ambient AI

A new paper explores the limitations of automated evaluation for AI code review bots, finding that current automated methods like G-Eval and LLM-as-a-Judge show only moderate alignment with human developer labels. The s…
RESEARCH · CL_00777 · Aug 13 · 10:00

OpenAI abandons SWE-bench Verified due to flawed tests and data contamination

OpenAI has announced it will no longer use SWE-bench Verified to evaluate the coding capabilities of frontier AI models. The benchmark has become contaminated, with models showing improved scores primarily due to exposu…
SIGNIFICANT · CL_39485 · Aug 24 · 07:00

OpenAI partners with Apple, Google DeepMind researches agent scaling

OpenAI has announced a partnership with Apple to integrate ChatGPT into iOS, iPadOS, and macOS, enhancing Siri and system-wide writing tools with GPT-4o capabilities. Google DeepMind has published research on scaling AI…