ENTITY GPQA Diamond

GPQA Diamond

PulseAugur coverage of GPQA Diamond — every cluster mentioning GPQA Diamond across labs, papers, and developer communities, ranked by signal.

Total · 30d

22

22 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

16

16 over 90d

TIER MIX · 90D

frontier release 1
research 10
tool 10
commentary 1

TOPICS

SENTIMENT · 30D

6 day(s) with sentiment data

RECENT · PAGE 1/2 · 22 TOTAL

RESEARCH · CL_112056 · Jun 26 · 08:18

Nobel laureate John Jumper joins Anthropic from Google DeepMind

John Jumper, a Nobel laureate and co-creator of AlphaFold, has joined Anthropic from Google DeepMind. His move comes shortly after another key Google researcher, Noam Shazeer, departed for OpenAI. Jumper's arrival at An…
RESEARCH · CL_111567 · Jun 25 · 00:00

New research reveals co-failure ceiling limits LLM ensemble gains

A new research paper introduces the concept of a "co-failure ceiling" to explain the limitations of combining multiple large language models. The study demonstrates that the accuracy gains from ensemble methods like rou…
TOOL · CL_108106 · Jun 24 · 04:00

Sakana Fugu orchestrator models combine LLMs for collective intelligence

Researchers have developed Sakana Fugu, a family of orchestrator models designed to combine the specialized capabilities of multiple Large Language Models (LLMs) into a collectively intelligent system. These models act …
RESEARCH · CL_104766 · Jun 20 · 00:00

New decoding strategy bypasses LLM alignment tax for better reasoning

Researchers have introduced a novel decoding strategy called Confident Decoding, which aims to mitigate the "alignment tax" in large language models. This tax occurs when final layers of LLMs, after being fine-tuned for…
SIGNIFICANT · CL_95036 · Jun 16 · 14:50

SubQ unveils SubQ 1.1 Small with 12M-token context and sparse attention

SubQ has released its SubQ 1.1 Small model, featuring a new Subquadratic Sparse Attention (SSA) architecture designed to overcome the quadratic scaling limitations of traditional attention mechanisms. This new architect…
SIGNIFICANT · CL_95355 · Jun 16 · 00:00

Fireworks AI offers Zhipu AI's GLM-5.2, top open-weights coding model

Fireworks AI has announced that GLM-5.2 is now available on its inference platform, highlighting its performance as the top-ranked open-weights model for coding and third overall on the GDPval-AA benchmark. The model, d…
TOOL · CL_85566 · Jun 11 · 13:00

LLM benchmarks saturate quickly due to training data contamination

Public LLM benchmarks are becoming saturated and less useful for differentiating top-tier models due to their training data inadvertently including benchmark questions. This contamination issue, observed in benchmarks l…
SIGNIFICANT · CL_56706 · May 28 · 08:20

Alibaba's Qwen3.7-Max debuts with 1M context, autonomous coding

Alibaba has released Qwen3.7-Max, an agent-first LLM with a 1 million token context window, capable of autonomous coding tasks. The model demonstrated a 35-hour coding session without human intervention, optimizing code…
RESEARCH · CL_61375 · May 27 · 18:09

NVIDIA quantizes Alibaba's Qwen3.6-35B model for efficient deployment

NVIDIA has released a quantized version of Alibaba's Qwen3.6-35B-A3B model, named nvidia/Qwen3.6-35B-A3B-NVFP4. This model utilizes the NVFP4 data type, reducing memory requirements by approximately 3.06x while maintain…
RESEARCH · CL_56153 · May 26 · 18:26

New Framework Unpacks LLM Pipeline Failures in Detection and Correction

A new research paper introduces a framework to understand the puzzling behaviors observed in multi-stage Large Language Model (LLM) pipelines, such as accuracy plateaus and reversals. The proposed model decomposes agent…
TOOL · CL_51144 · May 26 · 04:00

LLMs improve reasoning with new Verification-First prompting strategy

Researchers have developed a new prompting strategy called Verification-First (VF) to improve Large Language Model reasoning without significant training costs or extensive sampling. This method prompts LLMs to verify a…
TOOL · CL_44823 · May 22 · 04:00

New STAND technique slashes LLM reasoning latency by 65%

Researchers have developed STAND (STochastic Adaptive N-gram Drafting), a new model-free speculative decoding technique designed to accelerate language model reasoning. This method leverages the redundancy in reasoning …
RESEARCH · CL_42520 · May 20 · 14:51

LLM Chain-of-Thought Reasoning Found to be Unfaithful

Recent research indicates that Chain-of-Thought (CoT) reasoning in large language models is not always faithful to the model's internal decision-making process. Studies reveal that models may generate plausible-sounding…
RESEARCH · CL_21935 · May 8 · 00:00

Apple's RVPO framework enhances LLM alignment by penalizing reward variance

Researchers have introduced Reward-Variance Policy Optimization (RVPO), a novel framework designed to improve the alignment of large language models with multiple objectives. Unlike existing methods that average rewards…
COMMENTARY · CL_20705 · May 7 · 04:27

AI models: Choose benchmarks over hype for true performance

A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …
TOOL · CL_20624 · May 7 · 04:00

New fine-tuning method boosts LLM knowledge injection without paraphrasing

Researchers have developed a new fine-tuning method called Diffusion-Inspired Masked Fine-Tuning (DMT) for autoregressive large language models (LLMs). This technique aims to improve the injection of factual knowledge i…
RESEARCH · CL_14447 · May 4 · 04:00

New method enhances LLM reasoning diversity without sacrificing stability

Researchers have introduced Expert-Sample, a novel training-free method designed to enhance the performance of fine-grained Mixture-of-Experts (MoE) models. This technique addresses the trade-off between diversity and s…
RESEARCH · CL_14144 · Apr 30 · 20:30

State Stream Transformer V2 enhances LLM reasoning with parallel training and latent state streaming

Researchers have developed the State Stream Transformer (SST) V2, an architectural innovation designed to enhance latent space reasoning in language models. Unlike standard transformers that reset context at each step, …
RESEARCH · CL_47651 · Apr 29 · 00:00

DeepSeek-V4 Pro model with 1.6T parameters now on Together AI

DeepSeek-V4 Pro, a large Mixture-of-Experts model with 1.6 trillion parameters, is now accessible on the Together AI platform. This model is designed for long-context reasoning, supporting up to a 512K-token context win…
RESEARCH · CL_03564 · Apr 25 · 19:13

FINAL-Bench/Darwin-36B-Opus · Hugging Face

The Darwin-36B-Opus model, a 36-billion-parameter mixture-of-experts language model, has been released. It was created using the Darwin V7 evolutionary breeding engine, combining aspects of Qwen/Qwen3.6-35B-A3B and a Cl…