Brief · PulseAugur

RESEARCH · Alignment Forum English(EN) · 4d · [2 sources]

My research: a computational cognitive neuroscience perspective on alignment

Researchers have proposed a new metric called "task complexity" to quantify the length of the shortest program needed to achieve a target performance on a task. This metric aims to operationalize the superficial alignment hypothesis, suggesting that pre-trained large language models significantly reduce the complexity of accessing their knowledge. Experiments indicate that while pre-training enables access to strong performance, it can require large programs, whereas post-training drastically collapses this complexity to kilobytes. AI

IMPACT This research offers a new way to measure and understand how LLMs store and retrieve information, potentially guiding future alignment strategies.

AI
Alignment Forum
Tomás Vergara Browne
Superficial Alignment Hypothesis