English(EN) The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

AI 研究探讨 Transformer 的表达能力和课程学习的益处

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-05 04:00

两篇新研究论文探讨了 Transformer 模型及其推理能力的理论方面。其中一篇论文分析了标准 Transformer 解码器在 Softmax 注意力下的表达能力，证明了它们如何能够以对数缩放模拟图灵机。第二篇论文为 LLM 后训练中的课程学习提供了一个理论框架，表明与非课程方法相比，它可以将推理任务的样本复杂度提高一个数量级。 AI

影响这些理论进步可能带来更高效、更强大的 AI 模型，以应对复杂的推理任务。

排序理由 arXiv 上发表的两篇学术论文，讨论了 AI 模型和训练技术的理论方面。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Stephan Eckstein · 2026-05-18 08:57

低精度Softmax Transformer的表达能力与（已总结的）思维链

Existing expressivity results for transformers typically rely on hardmax attention, high precision, and other architectural modifications that disconnect them from the models used in practice. We bridge this gap by analyzing standard transformer decoders with softmax attention an…
arXiv cs.LG TIER_1 English(EN) · Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Hau-San Wong, Qingfu Zhang, Taiji Suzuki · 2026-05-05 04:00

Transformer树状推理后训练中课程的可证明益处

arXiv:2511.07372v3 Announce Type: replace Abstract: Recent curriculum techniques in the post-training stage of LLMs have been empirically observed to outperform non-curriculum approaches in improving reasoning performance, yet a principled understanding of their effectiveness and…

报道来源 [2]

低精度Softmax Transformer的表达能力与（已总结的）思维链

Transformer树状推理后训练中课程的可证明益处

相关实体

相关话题