CruxEval
PulseAugur coverage of CruxEval — every cluster mentioning CruxEval across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
量化影响大语言模型性能,更大模型表现出更强的韧性
一篇新的研究论文探讨了量化对大语言模型性能的影响,考察了从2位到6位精度的模型。研究发现,虽然更高的精度通常能带来更好的性能,但激进的量化往往能保留可接受的准确性,尽管一些模型会出现显著的性能下降。更大的模型往往对量化更具韧性,但中等规模的模型(70亿至90亿参数)在效率和性能之间提供了良好的平衡。
-
New framework StepCodeReasoner boosts code reasoning with execution traces
Researchers have developed StepCodeReasoner, a new framework designed to improve code reasoning by focusing on intermediate execution states rather than just final outputs. This approach uses structured print statements…
-
Researchers generate verifiable code reasoning data to boost LLM performance
Researchers have developed a new method to generate verifiable Chain-of-Thought (CoT) rationales for code reasoning by instrumenting code to capture execution traces. This pipeline narrates these traces into natural lan…