研究人员提出了持续学习AI模型的新方法,解决了在不损害现有能力的情况下整合新知识的挑战。一种方法KeepLoRA++利用层缩放残差梯度自适应来平衡视觉-语言模型中的知识保留和可塑性。另一种方法ReGrad将梯度视为可检索单元,从而无需累积权重漂移即可进行参数化知识注入。此外,一种新的函数梯度下降算法可以自适应函数梯度表示,提供改进的收敛保证和性能。
AI
arXiv:2509.14969v2 Announce Type: replace Abstract: We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-pa…
arXiv cs.AI
TIER_1English(EN)·Weihang Su, Jiacheng Kang, Jingyan Xu, Qingyao Ai, Jianming Long, Hanwen Zhang, Bangde Du, Xinyuan Cao, Min Zhang, Yiqun Liu·
arXiv:2410.11687v3 Announce Type: replace-cross Abstract: Linear recurrent networks (LRNNs) offer linear-time sequence modeling, but standard recurrent updates do not directly expose the supervised products needed for in-context gradient descent. We propose a sufficient construct…
arXiv:2606.16256v1 Announce Type: cross Abstract: Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acq…
arXiv:2606.16926v1 Announce Type: cross Abstract: Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. A…
Continual post-training enables models to absorb emerging knowledge after deployment, but repeatedly updating shared parameters can accumulate weight drift, potentially causing catastrophic forgetting and degrading general capabilities. Retrieval-augmented generation avoids such …
arXiv:2502.18049v5 Announce Type: replace Abstract: Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this…
Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. An interesting alternative is functional gradient d…
Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents KeepLoRA++…