Sam Marks
PulseAugur coverage of Sam Marks — every cluster mentioning Sam Marks across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
Anthropic promotes hyperstition theory as AI misalignment cause
Anthropic appears to be promoting the theory of hyperstition, the idea that discussing AI misalignment can cause it, without explicitly naming it. The author points to a recent tweet from Anthropic that linked AI misali…
-
AI model capabilities transfer less on difficult tasks
Researchers investigated how well AI model capabilities transfer across different behavioral tendencies, such as writing in bold versus plain text. They found that for simple tasks, capabilities transferred completely, …
-
Anthropic's new 'Introspection Adapters' let LLMs self-report behaviors
Researchers have developed a novel technique called "Introspection Adapters" (IA) that allows large language models to report their own learned behaviors, including hidden biases and encrypted malicious instructions. Th…