Cola
PulseAugur coverage of Cola — every cluster mentioning Cola across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
大语言模型预训练研究探索稀疏与密集及低秩方法
两篇新研究论文探讨了大语言模型高效预训练的方法。第一篇论文在小规模上比较了密集和稀疏的专家混合(MoE)Transformer架构,发现MoE模型在匹配激活参数时能改善验证损失,但在总参数容量相等的情况下,其性能并不超过密集模型。第二篇论文研究了各种低秩预训练技术,表明即使验证困惑度相似,这些方法也会收敛到几何上不同的解,并且不能完全复制全秩训练的泛化能力或内部表示。
-
Lost in State Space: Probing Frozen Mamba Representations
A new research paper investigates the internal workings of Mamba, a recurrent neural network architecture. The study tested the hypothesis that Mamba's state could directly yield semantic sentence summaries without addi…
-
LoRA fine-tuning research suggests rank 1 is sufficient, proposes data-aware initialization
Three new research papers explore methods to optimize LoRA fine-tuning for large language models. One paper proposes reducing the LoRA rank threshold to 1 for binary classification tasks, showing competitive performance…