ENTITY speculative decoding

speculative decoding

PulseAugur coverage of speculative decoding — every cluster mentioning speculative decoding across labs, papers, and developer communities, ranked by signal.

Total · 30d

8 over 90d

Releases · 30d

0 over 90d

Papers · 30d

6 over 90d

TIER MIX · 90D

significant 1
research 3
tool 4

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 8 TOTAL

TOOL · CL_30971 · May 14 · 06:13

Speculative decoding boosts LLM efficiency with predict-and-verify

A new technique called speculative decoding allows large language models to generate text more efficiently by predicting ahead and then verifying. This method aims to reduce the computational cost of generating each tok…
RESEARCH · CL_25612 · May 8 · 13:08

AI research tackles speculative decoding flaws in LLMs

Two new research papers explore the intricacies of speculative decoding in large language models, a technique used to speed up inference. The first paper identifies a phenomenon called "attention drift" where the model'…
TOOL · CL_15962 · May 5 · 04:00

TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

Researchers have developed a new method called TokenTiming, inspired by Dynamic Time Warping, to improve the efficiency of speculative decoding in large language models. This technique allows for the use of draft and ta…
SIGNIFICANT · CL_13509 · May 3 · 08:12

Google's Gemma 4 models achieve 3x speed boost with speculative decoding

Google has released Multi-Token Prediction (MTP) drafters for its Gemma 4 open models, which can increase inference speed by up to three times. This advancement utilizes a speculative decoding architecture, allowing a l…
RESEARCH · CL_12748 · May 2 · 04:13

NVIDIA NeMo RL uses speculative decoding for 1.8x faster AI training

NVIDIA Research has integrated speculative decoding into its NeMo RL framework, resulting in a 1.8x speedup for rollout generation at an 8 billion parameter scale. This advancement, utilizing a vLLM backend, is projecte…
RESEARCH · CL_09381 · Apr 29 · 18:12

LLM training and serving efficiency explained through speculative decoding and paged attention

Reiner Pope has published an analysis detailing the mathematical and technical innovations behind large language model training and serving. The work explains how techniques like speculative decoding and paged attention…
RESEARCH · CL_06923 · Apr 28 · 04:00

New methods KERV and HeiSD accelerate embodied VLA models with kinematic awareness

Two new research papers introduce methods to accelerate the inference speed of Vision-Language-Action (VLA) models used for robot control. KERV utilizes a Kalman Filter to predict actions and adjust acceptance threshold…
RESEARCH · CL_01283 · Dec 5 · 00:00

Researchers unveil new methods to boost LLM inference speed and efficiency

Google Research has introduced "speculative cascades," a novel method to enhance Large Language Model (LLM) efficiency by merging speculative decoding with standard cascades. This hybrid approach aims to reduce computat…

Speculative decoding boosts LLM efficiency with predict-and-verify

AI research tackles speculative decoding flaws in LLMs

TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

Google's Gemma 4 models achieve 3x speed boost with speculative decoding

NVIDIA NeMo RL uses speculative decoding for 1.8x faster AI training

LLM training and serving efficiency explained through speculative decoding and paged attention

New methods KERV and HeiSD accelerate embodied VLA models with kinematic awareness

Researchers unveil new methods to boost LLM inference speed and efficiency