GPT-5.2

ENTITY GPT-5.2

GPT-5.2

PulseAugur coverage of GPT-5.2 — every cluster mentioning GPT-5.2 across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

65

65 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

52

52 over 90d

TIER MIX · 90D

frontier release 1
significant 2
research 24
tool 35
commentary 2
meme 1

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

21 day(s) with sentiment data

RECENT · PAGE 2/4 · 65 TOTAL

RESEARCH · CL_62272 · May 29 · 15:46

Local LLMs show promise for confidential translation work

A new research paper benchmarks locally runnable language models for confidential translation tasks, expanding upon previous work with a multilingual corpus. The study evaluates several local LLMs using Ollama across fo…
TOOL · CL_57927 · May 28 · 21:25

Open-Source LLMs Evolve: Attention, Multimodality, and Efficiency Gains

The open-source LLM landscape has seen significant shifts in recent months, with Sliding Window Attention becoming mainstream, enabling much larger context windows. QK-Norm is also gaining traction as a training stabili…
TOOL · CL_53673 · May 27 · 04:00

New AI Framework Automates Complex Materials Science Calculations

Researchers have developed AutoDFT, a novel closed-loop multi-agent framework designed to automate Density Functional Theory (DFT) calculations in materials science. Unlike previous LLM-based agents that only plan upfro…
TOOL · CL_51307 · May 26 · 04:00

New benchmark and model advance AI customer service capabilities

Researchers have developed a new benchmark, OlaBench, and a corresponding model, OlaMind, to better evaluate and improve customer service AI systems. Existing benchmarks often fail to capture real-world dialogue nuances…
TOOL · CL_51196 · May 26 · 04:00

LLMs struggle with partially correct answers in automated scoring

A new research paper explores the challenges of automated short answer scoring (ASAS) using large language models (LLMs). The study found that while LLMs like GPT-5.2, GPT-4o, and Claude Opus 4.5 perform well on fully c…
RESEARCH · CL_51036 · May 26 · 04:00

New AI text detector READER outperforms larger models

Researchers have developed READER, a novel system for detecting AI-generated text that outperforms larger models by incorporating a reasoning-based approach. This system, fine-tuned on a curated dataset of rationales an…
TOOL · CL_81354 · May 26 · 00:00

LLMs as mutation operators boost evolutionary search in DEI framework

Researchers have developed DEI, a distributed Quality-Diversity search framework that leverages heterogeneous large language models as mutation operators. This approach enhances evolutionary inference by utilizing the d…
TOOL · CL_50829 · May 24 · 22:18

AI models show improved adherence to behavioral constitutions

A new audit pipeline reveals that while AI models are improving at adhering to their specified behavioral constitutions, they still exhibit significant failure rates. The pipeline, which decomposes specifications into t…
TOOL · CL_44724 · May 22 · 04:00

New ERM framework critiques LLM causal reasoning without labels

A new framework called Epistemic Regret Minimization (ERM) has been introduced to improve the causal reasoning of large language models. Unlike traditional methods that only reward correct answers, ERM critiques the und…
TOOL · CL_43174 · May 21 · 23:09

GPT-5.2 shows expert-level performance in scientific peer review

A recent evaluation suggests that GPT-5.2 is performing at an expert level in scientific peer review. In a study involving 45 scientists and 469 hours, AI reviews were found to be competitive with top human reviewers on…
RESEARCH · CL_44020 · May 21 · 00:33

LLMs outperform fine-tuned models on rare suicide circumstances

A new research paper compares the performance of large language models (LLMs) against fine-tuned RoBERTa models for extracting complex circumstances from death investigation narratives. The study introduces a "Complexit…
RESEARCH · CL_44807 · May 20 · 23:04

New benchmark tests LLMs on rare clinical cases beyond guidelines

Researchers have developed OGCaReBench, a new benchmark designed to evaluate how well large language models can answer complex clinical questions that fall outside standard medical guidelines. The benchmark, derived fro…
RESEARCH · CL_41794 · May 20 · 03:33

AI reviewers outperform humans on scientific paper critiques, study finds

A new study evaluated AI reviewers against human experts in assessing scientific papers, finding that AI models like GPT-5.2, Gemini 3.0 Pro, and Claude Opus 4.5 can outperform top human reviewers on certain metrics. Wh…
COMMENTARY · CL_35534 · May 17 · 12:30

Developer shares structured methodology for AI-assisted coding

A developer outlines a methodology for effectively using AI coding assistants like Anthropic's Claude Code, emphasizing a structured approach over simply prompting for entire applications. The process involves detailed …
TOOL · CL_32749 · May 15 · 03:39

Sea Limited deploys OpenAI Codex AI agents to speed up software development

Sea Limited is deploying OpenAI's Codex AI agents across its engineering teams to accelerate AI-native software development. This initiative aims to transform internal workflows by leveraging Codex as a 'command center'…
TOOL · CL_36570 · May 14 · 23:13

LLM pipeline extracts clinical data from nurse-patient transcripts

Researchers have developed a retrieval-augmented generation (RAG) pipeline to extract structured clinical information from nurse-patient conversations. This system, utilizing models like Llama-4-Scout and GPT-5.2, aims …
TOOL · CL_32553 · May 14 · 13:53

VLMs show promise in signature verification but struggle with skilled forgeries

Researchers explored the use of advanced Vision-Language Models (VLMs) for online signature verification, testing GPT-5.2 and Gemini 2.5 Pro in a zero-shot capacity. The study converted kinematic data into images and us…
TOOL · CL_27593 · May 10 · 13:31

New system MemPrivacy shields user data in edge-cloud AI agents

Researchers have developed MemPrivacy, a system designed to protect sensitive user information in LLM-powered agents that utilize cloud-assisted memory management. MemPrivacy identifies and masks private data on edge de…
TOOL · CL_25584 · May 8 · 12:12

LLMs struggle with nuanced answers in automated scoring, study finds

A new paper explores how large language models (LLMs) perform on automated short answer scoring (ASAS), particularly with partially correct responses. Researchers found that while LLMs like GPT-5.2, GPT-4o, and Claude O…
TOOL · CL_21267 · May 7 · 18:45

Cursor AI uses older models despite newer options being available

A user on Reddit's Cursor subreddit is questioning why the Cursor IDE's subagent feature is defaulting to older models like GPT-5.1 and GPT-5.2 for coding tasks. Despite configuring the system to use newer and potential…