Learn to Match: Two-Sided Matching with Temporally Extended Feedback
Researchers have developed a new framework for two-sided matching markets that accounts for information revealed over time, moving beyond static preference models. This framework, instantiated as the Learn2Match benchmark, uses a partially observable Markov game to model dynamic interactions like interviews and evolving profiles. The benchmark evaluates multi-agent reinforcement learning (MARL) policies, finding that while PPO shows promise in improving social welfare and reducing regret, it still struggles with information friction compared to bandit-style methods. AI
IMPACT Introduces a new benchmark for developing adaptive algorithms in dynamic matching markets, potentially improving resource allocation and decision-making.