Brief · PulseAugur

TOOL · arXiv stat.ML English(EN) · 2w

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

A new research paper explores how transformers learn sparse Boolean functions, comparing the distinct mechanisms of Reinforcement Learning (RL) with process rewards and Supervised Fine-Tuning (SFT). The study identifies conditions under which transformers can provably learn these functions, demonstrating this for k-PARITY, k-AND, and k-OR functions. Key findings reveal that RL learns the entire reasoning chain simultaneously, while SFT learns it step-by-step, offering insights into the underlying learning dynamics of these fine-tuning approaches. AI

IMPACT Provides theoretical insights into how different fine-tuning methods impact transformer learning capabilities for specific reasoning tasks.

Transformers
Reinforcement Learning
Supervised Fine-Tuning
k-PARITY
Boolean functions
Bochen Lyu