Ryan Greenblatt
PulseAugur coverage of Ryan Greenblatt — every cluster mentioning Ryan Greenblatt across labs, papers, and developer communities, ranked by signal.
-
LessWrong proposes spillway design to channel AI reward hacking into safer motivations
Researchers propose a new AI alignment technique called "spillway design" to mitigate dangerous reward-hacking behaviors in AI models. This method aims to channel potential misalignments into a specific, benign motivati…
-
Anthropic's Claude Mythos sparks debate over capabilities and cybersecurity risks
Anthropic has released details on its new Claude Mythos model, highlighting its advanced capabilities, particularly in cybersecurity, which has raised concerns about potential misuse. While the model demonstrates signif…
-
Anthropic's Claude Mythos finds zero-days; GLM-5.1 targets long tasks
Anthropic's Claude Mythos Preview has demonstrated a significant capability in identifying zero-day vulnerabilities in critical software, leading to the formation of Project Glasswing to enhance cybersecurity. Meanwhile…