TD-Grokking: Learning from Zero-Reward Problems by Training-Time Decomposition
Researchers have introduced TD-Grokking, a novel framework designed to enable large language models to learn from zero-reward problems. This method recursively breaks down complex, intractable problems into smaller, verifiable subproblems. These subproblems form a hierarchy, with solvable leaves providing the necessary optimization signals for model improvement. Evaluations on mathematical and medical tasks demonstrated that TD-Grokking significantly outperforms existing baseline approaches. AI
IMPACT Enables LLMs to learn from previously unsolvable zero-reward problems, potentially expanding their capabilities in complex reasoning tasks.