Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 8h

Re-feeding Is Not Replaying: Measuring Replay Noise in Counterfactual Token-Credit Estimation

A new paper from arXiv explores the reliability of counterfactual token-credit estimation in language models. The research highlights that re-feeding the transcript prefix as a fresh prompt, a common method, can introduce significant noise compared to resuming from the verified decode-time KV state. This noise can alter credit estimates, particularly at low-margin decision tokens, and impacts the selection of critical tokens. The study suggests that using batch-invariant kernels or resuming decoder state is crucial for more accurate credit estimation, and recommends reporting a replica floor to account for inherent noise in single-sample measurements. AI

IMPACT Highlights potential unreliability in current methods for attributing model outputs to specific tokens, impacting research into model interpretability.

Hugging Face
arXiv
vLLM
DagsHub
alphaXiv
ScienceCast
Gotit.pub
Grpo
CatalyzeX Code Finder for Papers
Re-feeding Is Not Replaying: Measuring Replay Noise in Counterfactual Token-Credit Estimation
KV state