ICML 2026: Automatically Generating Programs from Input-Output Examples - Reinforcement Learning Provides Reasoning Process Supervision for Large Model Programming-By-Example Tasks
Researchers have developed a novel framework called PRM-PBE to enhance the ability of large language models (LLMs) in Programming-by-Example (PBE) tasks. This method addresses the limitation of current LLMs in PBE, which often struggle with inferring underlying program logic from limited input-output examples due to a lack of fine-grained supervision on intermediate reasoning processes. PRM-PBE utilizes a process reward model (PRM) trained on feedback-guided reasoning trees to evaluate the reliability of intermediate steps, combined with a three-stage curriculum learning approach and PPO optimization for program synthesis. Experiments across multiple benchmarks demonstrated significant improvements over existing methods, even when using advanced models like DeepSeek-Coder-V2 and Claude-3.5-Sonnet. AI
IMPACT Enhances LLM program synthesis by providing intermediate reasoning supervision, potentially improving reliability in complex coding tasks.