CruxEval
PulseAugur coverage of CruxEval — every cluster mentioning CruxEval across labs, papers, and developer communities, ranked by signal.
-
Quantization impacts LLM performance, with larger models showing more resilience
A new research paper explores the impact of quantization on large language model performance, examining models from 2-bit to 6-bit precision. The study found that while higher precision generally leads to better perform…
-
New framework StepCodeReasoner boosts code reasoning with execution traces
Researchers have developed StepCodeReasoner, a new framework designed to improve code reasoning by focusing on intermediate execution states rather than just final outputs. This approach uses structured print statements…
-
Researchers generate verifiable code reasoning data to boost LLM performance
Researchers have developed a new method to generate verifiable Chain-of-Thought (CoT) rationales for code reasoning by instrumenting code to capture execution traces. This pipeline narrates these traces into natural lan…