PulseAugur
LIVE 01:51:04
research · [2 sources] ·
0
research

New DoTS framework synthesizes SFT and RLVR LLM capabilities at inference time

Researchers have developed a novel post-hoc framework called Decoupled Test-time Synthesis (DoTS) to integrate Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) for large language models. This method addresses the challenges of catastrophic forgetting and gradient conflicts that arise from sequential or joint training of these two paradigms. DoTS synthesizes the capabilities of independently trained SFT and RLHF checkpoints at inference time using task vector arithmetic, significantly reducing computational cost and avoiding parameter updates. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Enables more efficient integration of SFT and RLHF, potentially improving LLM performance on diverse tasks without extensive retraining.

RANK_REASON The cluster contains an arXiv preprint detailing a new method for integrating SFT and RLHF.

Read on arXiv cs.LG →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 · Chaohao Yuan, Chenghao Xiao, Yu Rong, Hong Cheng, Long-Kai Huang ·

    Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

    arXiv:2605.00610v1 Announce Type: new Abstract: SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary stren…

  2. arXiv cs.LG TIER_1 · Long-Kai Huang ·

    Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

    SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary strengths remains a formidable challenge. Sequential …