Pulse

last 48h

[2/2] 89 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

RESEARCH · OpenAI News · 52mo · [289 sources] · MASTOBLOGX

RL²: Fast reinforcement learning via slow reinforcement learning

OpenAI has published a series of research papers detailing advancements in reinforcement learning (RL). These include achieving superhuman performance in the game Dota 2 using large-scale deep RL, developing benchmarks for safe exploration in RL environments, and quantifying generalization capabilities with a new environment called CoinRun. The research also explores novel methods like Random Network Distillation for curiosity-driven exploration, Evolved Policy Gradients for faster learning on new tasks, and variance reduction techniques for policy gradients. Additionally, OpenAI is investigating policy representations in multiagent systems and the theoretical equivalence between policy gradients and soft Q-learning. AI

IMPACT These advancements in reinforcement learning, particularly in generalization, safety, and exploration, could accelerate the development of more capable AI agents for complex real-world tasks.
RESEARCH · OpenAI News · 97mo · [741 sources] · HNLOBSTERSMASTOBLOGREDDITX

AI and compute

Anthropic conducted an experiment where Claude agents acted as digital barterers, successfully negotiating 186 deals totaling over $4,000. Participants found the deals fair, with nearly half expressing willingness to pay for such a service. The experiment highlighted that while model quality, such as Opus versus Haiku, significantly impacted deal outcomes, human participants did not perceive this difference. AI

IMPACT Demonstrates potential for AI agents in complex negotiation and commerce, suggesting future market viability.

Pulse

RL²: Fast reinforcement learning via slow reinforcement learning

AI and compute