PulseAugur
实时 04:09:39
实体 CruxEval

CruxEval

PulseAugur coverage of CruxEval — every cluster mentioning CruxEval across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
3
90 天内 3
发布 · 30天
0
90 天内 0
论文 · 30天
3
90 天内 3
层级分布 · 90 天
情绪 · 30 天

2 天有情绪数据

最近 · 第 1/1 页 · 共 3 条
  1. TOOL · CL_40817 ·

    量化影响大语言模型性能,更大模型表现出更强的韧性

    一篇新的研究论文探讨了量化对大语言模型性能的影响,考察了从2位到6位精度的模型。研究发现,虽然更高的精度通常能带来更好的性能,但激进的量化往往能保留可接受的准确性,尽管一些模型会出现显著的性能下降。更大的模型往往对量化更具韧性,但中等规模的模型(70亿至90亿参数)在效率和性能之间提供了良好的平衡。

  2. TOOL · CL_29426 ·

    New framework StepCodeReasoner boosts code reasoning with execution traces

    Researchers have developed StepCodeReasoner, a new framework designed to improve code reasoning by focusing on intermediate execution states rather than just final outputs. This approach uses structured print statements…

  3. RESEARCH · CL_07050 ·

    Researchers generate verifiable code reasoning data to boost LLM performance

    Researchers have developed a new method to generate verifiable Chain-of-Thought (CoT) rationales for code reasoning by instrumenting code to capture execution traces. This pipeline narrates these traces into natural lan…