Reinforcement Learning Foundation Models Should Already Be A Thing
A new research paper proposes the development of foundation models specifically for reinforcement learning (RL), arguing that this area is currently a conspicuous gap compared to language and vision. The authors suggest that Markov decision processes (MDPs) are well-suited for attention-based architectures, similar to those used in tabular foundation models. As a demonstration, they trained a model on synthetic MDPs that successfully solved held-out tabular benchmarks with minimal tuning, outperforming traditional methods like UCB-VI and tabular Q-learning in online settings and competing with VI-LCB in offline scenarios. AI
IMPACT Could accelerate the development of more capable and generalizable AI agents by leveraging structured data and attention mechanisms.