New DoTS framework synthesizes SFT and RLVR LLM capabilities at inference time

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-01 12:20

Researchers have developed a novel post-hoc framework called Decoupled Test-time Synthesis (DoTS) to integrate Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) for large language models. This method addresses the challenges of catastrophic forgetting and gradient conflicts that arise from sequential or joint training of these two paradigms. DoTS synthesizes the capabilities of independently trained SFT and RLHF checkpoints at inference time using task vector arithmetic, significantly reducing computational cost and avoiding parameter updates. AI

影响 Enables more efficient integration of SFT and RLHF, potentially improving LLM performance on diverse tasks without extensive retraining.

排序理由 The cluster contains an arXiv preprint detailing a new method for integrating SFT and RLHF.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Chaohao Yuan, Chenghao Xiao, Yu Rong, Hong Cheng, Long-Kai Huang · 2026-05-04 04:00

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

arXiv:2605.00610v1 Announce Type: new Abstract: SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary stren…
arXiv cs.LG TIER_1 English(EN) · Long-Kai Huang · 2026-05-01 12:20

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary strengths remains a formidable challenge. Sequential …

报道来源 [2]

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

相关实体

相关话题