Researchers have introduced a new training method for Generative Flow Networks (GFlowNets) called Rooted absorbed prefix Trajectory Balance (RapTB), designed to address issues like prefix collapse and length bias in large language models. RapTB improves credit assignment by anchoring subtrajectory supervision at the root and propagating rewards to intermediate prefixes. Additionally, a submodular replay refresh strategy named SubM is proposed to combat distribution shift caused by biased replay, promoting both high reward and diversity in the training flow. Empirical results on tasks like molecule generation demonstrate that RapTB combined with SubM enhances optimization performance and molecular diversity while maintaining validity. AI
IMPACT Introduces novel techniques to improve LLM training stability and output quality, potentially enhancing generative AI applications.
RANK_REASON This is a research paper detailing a new method for training GFlowNets. [lever_c_demoted from research: ic=1 ai=1.0]
- Generative Flow Networks
- GFlowNets
- large language models
- RapTB
- Rooted absorbed prefix Trajectory Balance
- SMILES
- Submodular Replay
- Xi Wang
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →