EleutherAI and SynthLabs enhance GPT-NeoX with advanced AI preference learning

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

EleutherAI and SynthLabs have partnered to integrate advanced preference learning techniques, including Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO), into the widely-used GPT-NeoX framework. This collaboration aims to enhance the accessibility and scalability of research in Reinforcement Learning from Human Feedback (RLHF) and related methods. The updated GPT-NeoX library now supports training reward models and improved supervised fine-tuning, leveraging existing optimizations for large-scale distributed training to improve efficiency over other libraries. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON The release details new methods and implementations for preference learning within an existing open-source framework, akin to an academic paper or research update.

Read on EleutherAI Blog →

EleutherAI and SynthLabs enhance GPT-NeoX with advanced AI preference learning

COVERAGE [1]

EleutherAI Blog TIER_1 · 2024-10-10 00:00

RLHF and RLAIF in GPT-NeoX

GPT-NeoX now supports post-training thanks to a collaboration with SynthLabs.

COVERAGE [1]

RLHF and RLAIF in GPT-NeoX

RELATED TOPICS