Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 18h · [2 sources]

Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code Models

A new study published on arXiv investigates post-hoc falsification operators for small, frozen code models, finding that most operators do not improve accuracy over standard methods like Best-of-N. The research highlights a "coverage wall" and "capability scissors" as key limitations. However, an "expression-layer recovery" method showed promise by recovering correct programs that standard extractors discard, boosting the performance of DeepSeek-Coder-1.3B on benchmarks like HumanEval+. AI

IMPACT Suggests that current methods for verifying and repairing code generated by small models are insufficient, highlighting the need for better evaluation harnesses.

Hugging Face
arXiv
MBPP+
HumanEval+
DeepSeek-Coder-1.3B
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
DagsHub