Researchers have investigated whether large language models utilize their intermediate 'scratchpad' reasoning steps as intended for subsequent computations. By editing internal representations of these scratchpad states and observing the model's predictions, they found that models trained to use scratchpads causally adjust their subsequent steps based on these edited states. This effect was observed across different model families, suggesting that scratchpad oversight can indeed train models to use written states as part of their computational process, rather than just for human legibility. AI
IMPACT This research suggests that current methods for training LLMs to use intermediate reasoning steps may be effective, potentially leading to more reliable and interpretable AI systems.
RANK_REASON Academic paper detailing a new research finding on LLM internal reasoning. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →