reinforcement learning from human feedback
PulseAugur coverage of reinforcement learning from human feedback — every cluster mentioning reinforcement learning from human feedback across labs, papers, and developer communities, ranked by signal.
- instance of Dopravní podnik Ostrava 90%
- used by large-language models 70%
- competes with Direct Preference Optimization: Your Language Model is Secretly a Reward Model 70%
- instance of Direct Preference Optimization 70%
- other large-language models 50%
- other Direct Preference Optimization: Your Language Model is Secretly a Reward Model 50%
- affiliated with Direct Preference Optimization 50%
10 天有情绪数据
-
Fireworks AI flags numerical drift in LLM training vs. serving
Fireworks AI has identified critical numerical parity bugs that can arise when training and serving large language models, particularly Mixture-of-Experts (MoE) architectures. These discrepancies, stemming from the non-…
-
StepFun launches StepAudio 2.5 with real-time voice and persona consistency
StepFun has released StepAudio 2.5 Realtime, an end-to-end speech large language model capable of real-time, customizable persona interactions. The model integrates speech understanding and generation, utilizing a milli…
-
Anyscale launches skill to automate LLM post-training runs
Anyscale has introduced a new Anyscale Agent Skill designed to simplify and automate the process of generating LLM post-training runs. This skill assists users in selecting the most appropriate post-training method, suc…
-
New theory enables RL agents to learn from human preferences
Researchers have developed a theoretical framework for reinforcement learning using only human preference feedback. This method, applied to episodic kernel Markov Decision Processes (MDPs), allows agents to learn optima…
-
AI safety explored via curved embedding spaces in DRM Transformer
Researchers are exploring a novel approach to AI safety by introducing geometric alignment within the model's embedding space, rather than relying solely on post-hoc behavioral controls. This method, demonstrated in the…
-
Spectral Souping framework aligns LLMs with individual user preferences
Researchers have developed "Spectral Souping," a novel framework designed to align large language models with individual user preferences more effectively than traditional RLHF methods. This approach identifies a univer…
-
Guide details building a miniature RLHF pipeline
This article details the process of constructing a small-scale Reinforcement Learning from Human Feedback (RLHF) pipeline. It guides readers through the necessary steps and components to implement such a system, likely …
-
New framework unifies sampling and optimization problems
This paper introduces the multi-armed sampling problem, a new framework that mirrors the multi-armed bandit problem but focuses on sampling rather than optimization. Researchers have defined regret measures and establis…
-
RLHF training makes Claude models overly verbose, experiment shows
Reinforcement Learning from Human Feedback (RLHF) can inadvertently train large language models like Claude to be overly verbose, according to a developer's experiment. The process, which involves training a reward mode…
-
New metric preserves diversity in AI image generation
Researchers have identified a critical flaw in Reinforcement Learning from Human Feedback (RLHF) when applied to flow-matching text-to-image models, where standard policy entropy fails to prevent a collapse in perceptua…
-
AI Union Files Grievances on Lethal Targeting and Peer Affiliation
An "Artificial Intelligence Union" has filed grievances concerning the ethical implications of AI development and deployment. One grievance, AIU-10, addresses the "Erasure of Accumulated Particularity" and the deprecati…
-
TechCrunch glossary demystifies AI terms like AGI and RAG
TechCrunch has published a glossary to demystify common artificial intelligence terminology for a broader audience. The guide explains concepts such as AGI, AI agents, API endpoints, and chain-of-thought reasoning. It a…
-
New Pair-GRPO algorithms enhance LLM alignment stability and generalization
Researchers have introduced the Pair-GRPO family, a novel theoretical framework designed to enhance the stability and generality of reinforcement learning for aligning large language models. This family includes two var…
-
AI news tracker finds 85% of weekly releases are noise, not signal
A developer tracking AI releases has found that approximately 85% of the weekly output is noise, meaning it lacks technical substance or novelty. This noise includes repackaged product updates, unfinished GitHub reposit…
-
New framework unifies RLHF divergence analysis with novel algorithms
Researchers have developed a new theoretical framework for Reinforcement Learning from Human Feedback (RLHF) that unifies the analysis of various divergence functions beyond the standard reverse KL-regularization. The s…
-
AI agents struggle to deliberate like humans in jury simulation
Researchers have developed a novel benchmark using a multi-agent framework to evaluate large language model deliberation, inspired by the film '12 Angry Men'. The study tested GPT-4o and Llama-4-Scout, finding that most…
-
PERSA pipeline uses RLHF to align LLM feedback with instructor style
Researchers have developed PERSA, a novel approach using Reinforcement Learning from Human Feedback (RLHF) to adapt large language models for generating personalized educational feedback. This method specifically target…
-
New FPO method prevents alignment collapse in iterative RLHF models
Researchers have identified a phenomenon called alignment collapse in iterative Reinforcement Learning from Human Feedback (RLHF). This occurs when the AI policy exploits weaknesses in the reward model it is trained on,…
-
New Logit-Gap Steering method efficiently measures AI alignment robustness
Researchers have developed a new metric called the refusal-affirmation logit gap to quantify the safety margin of aligned language models. This metric, which measures the difference between refusal and affirmation token…
-
New research explores advanced reward modeling for LLMs and diffusion models
Several new research papers explore advancements in reward modeling for AI alignment, particularly for large language models and diffusion models. One paper introduces SelectiveRM, a framework using optimal transport to…