LiveCodeBench V6
PulseAugur coverage of LiveCodeBench V6 — every cluster mentioning LiveCodeBench V6 across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
新的自蒸馏方法提高了大型语言模型在推理任务上的性能
研究人员开发了新的大型语言模型自蒸馏技术,可在不依赖外部反馈的情况下提高其性能。AVSD(自适应视图自蒸馏)在多个特权信息视图之间平衡共识信号,并使用视图特定的残差来增强学习。自策略蒸馏(SPD)从梯度中提取能力子空间,以提高性能和泛化能力,尤其是在代码生成和数学推理方面。CEPO(对比证据策略优化)通过对比正确答案和错误答案来锐化关键标记的信用分配,从而提高了多模态数学推理基准的准确性。
-
Process Supervision via Verbal Critique Improves Reasoning in Large Language Models
Researchers have developed a new framework called Verbal Process Supervision (VPS) that enhances the reasoning capabilities of large language models without requiring gradient updates. This method utilizes structured na…
-
Google DeepMind launches Deep Think for Gemini Ultra subscribers
Google DeepMind has released a new AI capability called Deep Think, now available to Google AI Ultra subscribers via the Gemini app. This feature utilizes parallel thinking techniques, allowing the model to explore mult…