PulseAugur
实时 21:28:17

New DoTS framework synthesizes SFT and RLVR LLM capabilities at inference time

Researchers have developed a novel post-hoc framework called Decoupled Test-time Synthesis (DoTS) to integrate Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) for large language models. This method addresses the challenges of catastrophic forgetting and gradient conflicts that arise from sequential or joint training of these two paradigms. DoTS synthesizes the capabilities of independently trained SFT and RLHF checkpoints at inference time using task vector arithmetic, significantly reducing computational cost and avoiding parameter updates. AI

影响 Enables more efficient integration of SFT and RLHF, potentially improving LLM performance on diverse tasks without extensive retraining.

排序理由 The cluster contains an arXiv preprint detailing a new method for integrating SFT and RLHF.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

New DoTS framework synthesizes SFT and RLVR LLM capabilities at inference time

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Chaohao Yuan, Chenghao Xiao, Yu Rong, Hong Cheng, Long-Kai Huang ·

    Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

    arXiv:2605.00610v1 Announce Type: new Abstract: SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary stren…

  2. arXiv cs.LG TIER_1 English(EN) · Long-Kai Huang ·

    Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

    SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary strengths remains a formidable challenge. Sequential …