ENTITY Pfadfinder und Pfadfinderinnen Österreichs

Pfadfinder und Pfadfinderinnen Österreichs

PulseAugur coverage of Pfadfinder und Pfadfinderinnen Österreichs — every cluster mentioning Pfadfinder und Pfadfinderinnen Österreichs across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

20 over 90d

Releases · 30d

0 over 90d

Papers · 30d

19 over 90d

TIER MIX · 90D

TOPICS

RECENT · PAGE 1/1 · 20 TOTAL

TOOL · CL_34321 · May 16 · 09:37

LLM alignment: PPO, DPO, or verifier-based RL for 2026?

This article provides a technical guide for selecting the appropriate reinforcement learning technique for aligning large language models in 2026. It contrasts Proximal Policy Optimization (PPO) for Reinforcement Learni…
TOOL · CL_34502 · May 14 · 06:10

New Federated Actor-Critic Framework Enhances Personalized Policy Training

Researchers have developed a new federated actor-critic framework designed for collaborative policy training in environments with varying conditions. This approach allows multiple agents to share a common representation…
TOOL · CL_22524 · May 8 · 04:00

AI model optimizes HAPS base station positioning in windy maritime networks

Researchers have developed a new framework using deep reinforcement learning to dynamically position High-Altitude Platform Stations (HAPS) in maritime networks. This approach specifically addresses challenges posed by …
TOOL · CL_20509 · May 7 · 04:00

HELM system optimizes GPU HBM for generative recommender latency

Researchers have developed HELM, a system designed to optimize the performance of generative recommender models by dynamically managing High Bandwidth Memory (HBM) allocation between embedding (EMB) and KV caches. Exist…
TOOL · CL_20435 · May 7 · 04:00

Counter-Dyna cuts HVAC control training time to 5 weeks

Researchers have developed Counter-Dyna, a novel method for data-efficient reinforcement learning in HVAC control systems. This approach utilizes counterfactual surrogate models that leverage state-space invariances, si…
TOOL · CL_19903 · May 6 · 19:06

vLLM V1 engine rewrite achieves parity with V0 after backend fixes

Hugging Face's vLLM team detailed the process of aligning their new V1 engine with the V0 reference, focusing on ensuring backend parity before addressing Reinforcement Learning (RL) objective changes. They identified a…
TOOL · CL_18782 · May 6 · 04:00

New OGPO algorithm boosts sample efficiency for generative control policies in robotics

Researchers have introduced Off-policy Generative Policy Optimization (OGPO), a novel algorithm designed for sample-efficient finetuning of generative control policies in robotics. OGPO leverages off-policy critic netwo…
TOOL · CL_18538 · May 6 · 04:00

PERSA pipeline uses RLHF to align LLM feedback with instructor style

Researchers have developed PERSA, a novel approach using Reinforcement Learning from Human Feedback (RLHF) to adapt large language models for generating personalized educational feedback. This method specifically target…
TOOL · CL_16702 · May 5 · 13:22

Author demystifies reinforcement learning math with new blog series

A new blog series aims to demystify the mathematics behind reinforcement learning, starting with foundational concepts and progressing towards advanced algorithms like Proximal Policy Optimization (PPO). The initial pos…
TOOL · CL_16233 · May 5 · 04:00

New research shows high entropy leads to symmetry equivariant policies in Dec-POMDPs

A new paper explores how high entropy regularization can lead to symmetry-equivariant policies in Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). The research demonstrates that sufficiently hi…
RESEARCH · CL_16149 · May 5 · 04:00

AI agents leverage reinforcement learning to enhance software test case generation and code coverage

Researchers have developed two novel approaches for automated test case generation using large language models (LLMs) and reinforcement learning. The first method, PPO-LLM, employs Proximal Policy Optimization (PPO) to …
RESEARCH · CL_15452 · May 3 · 04:45

New research refines LLM alignment beyond DPO and RLHF

Researchers are exploring advanced methods for aligning large language models with human preferences, moving beyond traditional Reinforcement Learning from Human Feedback (RLHF). New approaches like Direct Preference Op…
RESEARCH · CL_11904 · May 1 · 04:00

New C++ engine HASE achieves 33M steps/sec for multi-agent RL training

Researchers have developed a new C++ engine called Hide-And-Seek-Engine (HASE) designed to significantly improve the efficiency of training reinforcement learning agents in decentralized, partially observable environmen…
RESEARCH · CL_08685 · Apr 29 · 04:00

xLSTM networks enhance deep reinforcement learning for automated stock trading

Researchers have developed a new automated stock trading system utilizing Extended Long Short-Term Memory (xLSTM) networks combined with deep reinforcement learning (DRL). This approach aims to overcome the limitations …
RESEARCH · CL_06928 · Apr 28 · 04:00

AI framework optimizes land use for ecosystem services in Lake Malawi Basin

Researchers have developed a deep reinforcement learning framework to optimize land-use allocation in the Lake Malawi Basin, aiming to enhance ecosystem service value. The system uses a Proximal Policy Optimization agen…
RESEARCH · CL_06752 · Apr 28 · 04:00

Researchers develop new methods to debias and improve reward models for LLMs

Researchers have developed new methods to improve the reliability and interpretability of reward models (RMs) used in aligning large language models (LLMs). One approach introduces a causally motivated intervention tech…
RESEARCH · CL_06317 · Apr 27 · 14:43

GradMAP AI learns decentralized grid-edge device control with faster training

Researchers have developed GradMAP, a novel gradient-based multi-agent proximal learning method designed for coordinating decentralized grid-edge devices. This approach trains independent neural network policies for eac…
RESEARCH · CL_05416 · Apr 21 · 14:07

DVPO and EVPO advance LLM post-training with novel RL optimization techniques

Researchers have introduced DVPO, a new reinforcement learning framework designed for improving Large Language Model (LLM) post-training, particularly when dealing with noisy or incomplete supervision signals. DVPO util…
SIGNIFICANT · CL_02559 · Apr 15 · 07:00

OpenAI Five AI defeats Dota 2 world champions in historic esports match

OpenAI Five has achieved a significant milestone by defeating the world champions of Dota 2 in two consecutive games at the OpenAI Five Finals. This marks the first time an AI has publicly triumphed over professional es…
RESEARCH · CL_01553 · Jul 20 · 07:00

OpenAI releases Proximal Policy Optimization for simpler, effective reinforcement learning

OpenAI has released Proximal Policy Optimization (PPO), a new reinforcement learning algorithm that offers comparable or superior performance to existing methods while being simpler to implement and tune. PPO strikes a …

LLM alignment: PPO, DPO, or verifier-based RL for 2026?

New Federated Actor-Critic Framework Enhances Personalized Policy Training

AI model optimizes HAPS base station positioning in windy maritime networks

HELM system optimizes GPU HBM for generative recommender latency

Counter-Dyna cuts HVAC control training time to 5 weeks

vLLM V1 engine rewrite achieves parity with V0 after backend fixes

New OGPO algorithm boosts sample efficiency for generative control policies in robotics

PERSA pipeline uses RLHF to align LLM feedback with instructor style

Author demystifies reinforcement learning math with new blog series

New research shows high entropy leads to symmetry equivariant policies in Dec-POMDPs

AI agents leverage reinforcement learning to enhance software test case generation and code coverage

New research refines LLM alignment beyond DPO and RLHF

New C++ engine HASE achieves 33M steps/sec for multi-agent RL training

xLSTM networks enhance deep reinforcement learning for automated stock trading

AI framework optimizes land use for ecosystem services in Lake Malawi Basin

Researchers develop new methods to debias and improve reward models for LLMs

GradMAP AI learns decentralized grid-edge device control with faster training

DVPO and EVPO advance LLM post-training with novel RL optimization techniques

OpenAI Five AI defeats Dota 2 world champions in historic esports match

OpenAI releases Proximal Policy Optimization for simpler, effective reinforcement learning