CSULoRA 方法在不牺牲效用的情况下增强 LLM 安全性

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-28 22:48

研究人员开发了 CSULoRA，这是一种用于纠正大型语言模型中低秩适配（LoRA）适配器的新型事后方法。该技术解决了即使是少量微调数据也会损害已对齐模型的安全性的问题。CSULoRA 估计一个安全对齐的子空间，然后调整 LoRA 更新以保留与任务相关的信息，同时减轻不安全的方向。 AI

影响增强微调期间的 LLM 安全性，可能使已适配模型的部署更加稳健。

排序理由该集群包含一篇详细介绍 LLM 微调新方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.LG TIER_1 English(EN) · Yilang Zhang, Bingcong Li, Georgios B. Giannakis · 2026-06-02 04:00

RefLoRA：重构低秩适配，高效微调大型模型

arXiv:2505.18877v4 Announce Type: replace Abstract: Low-Rank Adaptation (LoRA) lowers the computational and memory overhead of fine-tuning large models by updating a low-dimensional subspace of the pre-trained weight matrix. Albeit efficient, LoRA exhibits suboptimal convergence …
arXiv cs.CL TIER_1 English(EN) · Oleksandr Marchenko Breneur, Adelaide Danilov, Aria Nourbakhsh, Salima Lamsiyah · 2026-06-01 04:00

CSULoRA：最接近的安全更新低秩适应

arXiv:2605.30640v1 Announce Type: cross Abstract: Low-rank adaptation has become a standard method for parameter-efficient fine-tuning of large language models, but even small amounts of unsafe or adversarial fine-tuning data can substantially weaken the safety behavior of aligne…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 22:48

CSULoRA：最接近的安全更新低秩适应

Low-rank adaptation has become a standard method for parameter-efficient fine-tuning of large language models, but even small amounts of unsafe or adversarial fine-tuning data can substantially weaken the safety behavior of aligned models. Existing safety-preserving LoRA methods …