PulseAugur
EN
LIVE 13:10:16
ENTITY reinforcement learning

reinforcement learning

PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
234
234 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
223
223 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-18 research_milestone A new paper proposes a reinforcement learning framework for modeling customer trajectories in retail. source
SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 7/10 · 200 TOTAL
  1. TOOL · CL_41868 ·

    New CIG reward method enhances reinforcement learning exploration

    Researchers have introduced Conditional Information Gain (CIG), a novel reward mechanism for reinforcement learning designed to improve exploration strategies. CIG addresses limitations of existing methods by providing …

  2. RESEARCH · CL_41798 ·

    AI framework optimizes aeroengine pipe design for manufacturing

    Researchers have developed a new reinforcement learning framework, called FPRO, to optimize the design and manufacturing of free-form pipes in aeroengines. This approach integrates domain-specific manufacturing knowledg…

  3. RESEARCH · CL_42791 ·

    Mahjong RL simulator Mahjax achieves 2M steps/sec on GPUs

    Researchers have developed Mahjax, a new GPU-accelerated simulator for the complex game of Riichi Mahjong, implemented in JAX. This tool is designed to facilitate reinforcement learning research, particularly for agents…

  4. TOOL · CL_39391 ·

    Reinforcement learning explained: policies, MDPs, and trajectories

    This article explains how reinforcement learning agents make decisions by defining key concepts. It covers policies, Markov Decision Processes (MDPs), and trajectories. The series aims to build understanding towards the…

  5. RESEARCH · CL_39995 ·

    New research advances optimization and reinforcement learning theory

    Researchers have developed new theoretical frameworks for optimizing decision-making processes in machine learning. One paper introduces regret-based stopping criteria for Bayesian optimization, ensuring solutions are w…

  6. TOOL · CL_41182 ·

    New RL jailbreak method exploits LRM attention patterns

    Researchers have developed a new jailbreak method specifically targeting Large Reasoning Models (LRMs), which are known for their step-by-step problem-solving abilities. The method leverages reinforcement learning and i…

  7. RESEARCH · CL_39980 ·

    New flow matching methods enhance generative modeling and RL

    Researchers are advancing flow matching techniques for generative modeling across various domains. New methods like Kinetic Path Energy (KPE) and Kinetic Trajectory Shaping (KTS) aim to improve generation quality by ana…

  8. RESEARCH · CL_39989 ·

    Reinforcement learning optimizes physical activity for health biomarkers

    Researchers have developed a novel offline reinforcement learning algorithm to create personalized physical activity recommendations. This algorithm analyzes step count data and health biomarkers from the All of Us Rese…

  9. TOOL · CL_38815 ·

    Latent visual reasoning tokens prove non-essential for inference

    Researchers have investigated the role of latent visual reasoning, a technique that incorporates visual evidence into multimodal reasoning by using continuous latent tokens before text generation. Their findings suggest…

  10. TOOL · CL_38262 ·

    DiPRL method learns discrete programmatic policies for reinforcement learning

    Researchers have developed DiPRL, a novel method for learning discrete programmatic policies in reinforcement learning. This approach aims to overcome the performance degradation often seen when converting continuous pr…

  11. TOOL · CL_38270 ·

    Reinforcement learning models customer retail journeys for layout optimization

    Researchers have developed a new reinforcement learning (RL) framework to model customer movement in retail environments, aiming to provide practical insights for store layout optimization. This approach treats customer…

  12. TOOL · CL_35221 ·

    New PRISM framework corrects SFT flaws in multimodal LLM training

    New research from institutions including the Hong Kong University of Science and Technology (Guangzhou) reveals a critical flaw in the common post-training paradigm for multimodal large language models (MLLMs). The stan…

  13. TOOL · CL_34696 ·

    Developer uses domain randomization to train robust reinforcement learning agents

    A developer has made progress in training reinforcement learning agents using domain randomization. This technique helps create more robust agents, and the developer has successfully implemented it to improve a bot's ab…

  14. RESEARCH · CL_36602 ·

    New OptMuon method enhances stochastic optimization with adaptive momentum

    Researchers have introduced OptMuon, a novel adaptive momentum orthogonalization method for stochastic nonconvex optimization that calibrates update magnitudes from observed trajectories. This approach combines Muon-sty…

  15. TOOL · CL_36050 ·

    New method enhances vision-language models with group revision

    Researchers have introduced a new group-revision optimization paradigm to improve object-level grounding in large vision-language models. This method addresses the limitations of sparse, response-level rewards in existi…

  16. TOOL · CL_36969 ·

    RL agent controls GenAI access to boost student learning

    A new research paper proposes using reinforcement learning to control when students can access generative AI tools in educational settings. The study found that strategically timed access, managed by an RL agent, improv…

  17. TOOL · CL_36068 ·

    New E²PO framework enhances generative model alignment with human preference

    Researchers have introduced a new framework called Embedding-perturbed Exploration Preference Optimization (E²PO) to address limitations in aligning generative models with human intent using reinforcement learning. Exis…

  18. TOOL · CL_36975 ·

    Lamarckian inheritance benefits robots in predictable, dynamic environments

    Researchers have explored the impact of Lamarckian inheritance on evolutionary dynamics in dynamic environments for robotic agents. Their findings suggest that the benefit of Lamarckian inheritance, where learned traits…

  19. TOOL · CL_33404 ·

    New framework combines knowledge and RL for vehicle routing problems

    Researchers have developed a new framework for solving the Capacitated Vehicle Routing Problem (CVRP), a complex logistics challenge. Their approach integrates knowledge-based heuristics with reinforcement learning, bre…

  20. TOOL · CL_30955 ·

    New framework unifies sampling and optimization problems

    This paper introduces the multi-armed sampling problem, a new framework that mirrors the multi-armed bandit problem but focuses on sampling rather than optimization. Researchers have defined regret measures and establis…