EleutherAI and SynthLabs have partnered to integrate advanced preference learning techniques, including Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO), into the widely-used GPT-NeoX framework. This collaboration aims to enhance the accessibility and scalability of research in Reinforcement Learning from Human Feedback (RLHF) and related methods. The updated GPT-NeoX library now supports training reward models and improved supervised fine-tuning, leveraging existing optimizations for large-scale distributed training to improve efficiency over other libraries. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The release details new methods and implementations for preference learning within an existing open-source framework, akin to an academic paper or research update.