研究人员开发了改进大型语言模型监督微调(SFT)的新方法。一种方法FisherAdapTune利用Fisher信息几何动态选择参数组进行适应,增强了分布内性能和零样本迁移能力。另一组方法,包括Target-SFT和PriFT,将SFT重新解释为目标分布设计。这些技术旨在通过更好地将微调过程与模型的预训练知识对齐,来创建更稳定有效的训练目标,从而在各种推理和代码生成任务上取得最先进的结果。
AI
arXiv:2606.10196v1 Announce Type: cross Abstract: Parameter-efficient fine-tuning (PEFT) aims to adapt pretrained models with a small trainable parameter subset, however, most existing methods choose this subset from fixed architectural heuristics rather than using dynamic, task-…
arXiv:2606.11189v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one…
Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the…
arXiv:2606.09396v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) is an efficient approach for downstream task adaptation and often serves as the initialization stage for reinforcement learning (RL), but it can show weaker generalization than RL. A key limitation is …
Supervised fine-tuning (SFT) is an efficient approach for downstream task adaptation and often serves as the initialization stage for reinforcement learning (RL), but it can show weaker generalization than RL. A key limitation is its off-policy objective: SFT fits fixed demonstra…
Medium — fine-tuning tag
TIER_1English(EN)·Panisetti Prudhviraj·
<div class="medium-feed-item"><p class="medium-feed-snippet">Imagine I just hired a professional pianist who already knows how to play all kinds of music (Jazz, Pop, Classical… everything).</p><p class="medium-feed-link"><a href="https://infiniteknowledge.medium.com/unders…