ENTITY Reinforcement Learning From Human Feedback (RLHF)

Reinforcement Learning From Human Feedback (RLHF)

PulseAugur coverage of Reinforcement Learning From Human Feedback (RLHF) — every cluster mentioning Reinforcement Learning From Human Feedback (RLHF) across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

7 over 90d

Releases · 30d

0 over 90d

Papers · 30d

6 over 90d

TIER MIX · 90D

research 5
tool 1
meme 1

TOPICS

RECENT · PAGE 1/1 · 7 TOTAL

RESEARCH · CL_86663 · Jun 11 · 11:19

AI reward models show tension between helpfulness and harmlessness

A new research paper explores the tension between helpfulness and harmlessness in AI reward models, a crucial component of reinforcement learning from human feedback (RLHF). The study found that models trained on mixed …
RESEARCH · CL_82101 · Jun 9 · 07:57

New method leverages reward model states for better AI feedback

Researchers have developed a new method called Representation-Aware Advantage Estimation (GraphAE) that enhances reinforcement learning from human feedback (RLHF). This technique utilizes the richer information encoded …
TOOL · CL_79751 · Jun 9 · 04:00

New RePO framework enhances LLM training with regret minimization

Researchers have introduced a new framework called Regret-based Preference Optimization (RePO) for training large language models using human feedback. RePO reframes the process from reward maximization to regret minimi…
RESEARCH · CL_46766 · May 24 · 07:15

New AI Alignment Method Mimics Human Cognitive Processes

A new research paper proposes a method for creating AI decision-making models that are more faithful to human cognitive processes. This approach aims to improve AI alignment by incorporating heuristics and structured th…
RESEARCH · CL_48581 · May 22 · 14:00

New theory enables RL agents to learn from human preferences

Researchers have developed a theoretical framework for reinforcement learning using only human preference feedback. This method, applied to episodic kernel Markov Decision Processes (MDPs), allows agents to learn optima…
RESEARCH · CL_29313 · May 12 · 09:46

New framework improves reward modeling for diverse human preferences

Researchers have developed a new framework called Anchor-guided Variance-aware Reward Modeling to address limitations in standard reward models when dealing with diverse human preferences. This method enhances existing …
MEME · CL_25269 · May 10 · 17:59

AI in Sports Glossary Adds RLHF Term

A new term, "Reinforcement Learning From Human Feedback (RLHF)," has been added to a glossary focused on Artificial Intelligence in Sports. This addition aims to expand the resource's coverage of AI concepts relevant to…

AI reward models show tension between helpfulness and harmlessness

New method leverages reward model states for better AI feedback

New RePO framework enhances LLM training with regret minimization

New AI Alignment Method Mimics Human Cognitive Processes

New theory enables RL agents to learn from human preferences

New framework improves reward modeling for diverse human preferences

AI in Sports Glossary Adds RLHF Term