English(EN) From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

新研究发现过度训练专家模型会损害模型合并

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-18 04:00

一篇来自arXiv的新研究论文探讨了过度训练专家模型如何负面影响模型合并的有效性。该研究检查了不同模型规模的视觉和语言模态以及LoRA等适应方法，发现过度微调模型以处理困难示例会导致记忆化。这种记忆化会引起参数干扰，从而在合并这些过度训练的专家模型时导致性能下降。研究人员提出以任务为导向的提前停止作为一种策略来缓解此问题并改善合并结果。 AI

影响过度训练专家模型会在合并时降低性能，这表明需要仔细的微调策略和提前停止来最大化组合模型的能力。

排序理由发布在arXiv上的研究论文，详细介绍了关于模型合并的发现。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Stefan Horoi, Guy Wolf, Eugene Belilovsky, Gintare Karolina Dziugaite · 2026-06-18 04:00

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

arXiv:2506.14126v2 Announce Type: replace-cross Abstract: Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and adapters, often shared via…

报道来源 [1]

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

相关实体

相关话题