My research: a computational cognitive neuroscience perspective on alignment
Researchers have proposed a new metric called "task complexity" to quantify the length of the shortest program needed to achieve a target performance on a task. This metric aims to operationalize the superficial alignment hypothesis, suggesting that pre-trained large language models significantly reduce the complexity of accessing their knowledge. Experiments indicate that while pre-training enables access to strong performance, it can require large programs, whereas post-training drastically collapses this complexity to kilobytes. AI
IMPACT This research offers a new way to measure and understand how LLMs store and retrieve information, potentially guiding future alignment strategies.