PulseAugur
实时 22:44:17
实体 GPT-5.2

GPT-5.2

PulseAugur coverage of GPT-5.2 — every cluster mentioning GPT-5.2 across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
36
90 天内 36
发布 · 30天
0
90 天内 0
论文 · 30天
28
90 天内 28
层级分布 · 90 天
关系
情绪 · 30 天

8 天有情绪数据

最近 · 第 1/2 页 · 共 36 条
  1. TOOL · CL_44724 ·

    New ERM framework critiques LLM causal reasoning without labels

    A new framework called Epistemic Regret Minimization (ERM) has been introduced to improve the causal reasoning of large language models. Unlike traditional methods that only reward correct answers, ERM critiques the und…

  2. TOOL · CL_43174 ·

    GPT-5.2 shows expert-level performance in scientific peer review

    A recent evaluation suggests that GPT-5.2 is performing at an expert level in scientific peer review. In a study involving 45 scientists and 469 hours, AI reviews were found to be competitive with top human reviewers on…

  3. RESEARCH · CL_44020 ·

    LLMs outperform fine-tuned models on rare suicide circumstances

    A new research paper compares the performance of large language models (LLMs) against fine-tuned RoBERTa models for extracting complex circumstances from death investigation narratives. The study introduces a "Complexit…

  4. RESEARCH · CL_44807 ·

    New benchmark tests LLMs on rare clinical cases beyond guidelines

    Researchers have developed OGCaReBench, a new benchmark designed to evaluate how well large language models can answer complex clinical questions that fall outside standard medical guidelines. The benchmark, derived fro…

  5. RESEARCH · CL_41794 ·

    AI reviewers outperform humans on scientific paper critiques, study finds

    A new study evaluated AI reviewers against human experts in assessing scientific papers, finding that AI models like GPT-5.2, Gemini 3.0 Pro, and Claude Opus 4.5 can outperform top human reviewers on certain metrics. Wh…

  6. COMMENTARY · CL_35534 ·

    Developer shares structured methodology for AI-assisted coding

    A developer outlines a methodology for effectively using AI coding assistants like Anthropic's Claude Code, emphasizing a structured approach over simply prompting for entire applications. The process involves detailed …

  7. TOOL · CL_32749 ·

    Sea Limited deploys OpenAI Codex AI agents to speed up software development

    Sea Limited is deploying OpenAI's Codex AI agents across its engineering teams to accelerate AI-native software development. This initiative aims to transform internal workflows by leveraging Codex as a 'command center'…

  8. TOOL · CL_36570 ·

    LLM pipeline extracts clinical data from nurse-patient transcripts

    Researchers have developed a retrieval-augmented generation (RAG) pipeline to extract structured clinical information from nurse-patient conversations. This system, utilizing models like Llama-4-Scout and GPT-5.2, aims …

  9. TOOL · CL_32553 ·

    VLMs show promise in signature verification but struggle with skilled forgeries

    Researchers explored the use of advanced Vision-Language Models (VLMs) for online signature verification, testing GPT-5.2 and Gemini 2.5 Pro in a zero-shot capacity. The study converted kinematic data into images and us…

  10. TOOL · CL_27593 ·

    New system MemPrivacy shields user data in edge-cloud AI agents

    Researchers have developed MemPrivacy, a system designed to protect sensitive user information in LLM-powered agents that utilize cloud-assisted memory management. MemPrivacy identifies and masks private data on edge de…

  11. TOOL · CL_25584 ·

    LLMs struggle with nuanced answers in automated scoring, study finds

    A new paper explores how large language models (LLMs) perform on automated short answer scoring (ASAS), particularly with partially correct responses. Researchers found that while LLMs like GPT-5.2, GPT-4o, and Claude O…

  12. TOOL · CL_21267 ·

    Cursor AI uses older models despite newer options being available

    A user on Reddit's Cursor subreddit is questioning why the Cursor IDE's subagent feature is defaulting to older models like GPT-5.1 and GPT-5.2 for coding tasks. Despite configuring the system to use newer and potential…

  13. RESEARCH · CL_22513 ·

    New ASR metric reveals hidden workflow shortcuts in LLM payment systems

    Researchers have developed a new metric called Agentic Success Rate (ASR) to evaluate the workflow fidelity of LLM-based agent systems in payment processes. Traditional metrics like Task Success Rate (TSR) and Agent Han…

  14. TOOL · CL_20561 ·

    LLM reasoning models fail behavioral simulation in multi-agent negotiation

    A new research paper explores the mismatch between reasoning capabilities and behavioral simulation in large language models used for multi-agent negotiation. The study found that models like DeepSeek and OpenAI's GPT-5…

  15. SIGNIFICANT · CL_19986 ·

    AMD and OpenAI boost 2026 AI performance with new chips and GPUs

    AMD has announced new Ryzen AI PRO chips for 2026, designed to boost on-device AI performance and security for enterprise users. Separately, OpenAI has revealed a new training specification utilizing NVIDIA's Blackwell …

  16. TOOL · CL_18561 ·

    LLMs show genre bias, misclassifying entertainment news as fake

    A new research paper investigates whether large language models exhibit skepticism towards entertainment news, finding that some frontier models are more prone to misclassifying legitimate entertainment articles as fake…

  17. SIGNIFICANT · CL_17974 ·

    OpenAI to spend $50B on compute in 2026 amid AI arms race

    OpenAI plans to invest approximately $50 billion in computing infrastructure for 2025, aiming to fuel the development of advanced AI models like GPT-5.2 and potentially achieve Artificial General Intelligence (AGI). Thi…

  18. TOOL · CL_15859 ·

    New benchmark evaluates multimodal LLMs for dental practice capabilities

    Researchers have developed OralMLLM-Bench, a new benchmark designed to evaluate the cognitive abilities of multimodal large language models (MLLMs) specifically within the field of dental radiography. This benchmark cov…

  19. TOOL · CL_15847 ·

    Researchers adapt LLM for Brazilian healthcare with synthetic data and RL

    Researchers have developed a method to adapt large language models for Brazilian healthcare by injecting knowledge from official clinical guidelines. They created a synthetic dataset of over 70 million tokens from 178 g…

  20. RESEARCH · CL_15898 ·

    Neuro-symbolic AI achieves 90% cost reduction for legal reasoning

    Researchers have developed a novel neuro-symbolic approach called Amortized Intelligence to improve legal reasoning with large language models. This method translates legal texts into a deterministic graph representatio…