Online LLM Selection via Constrained Bandits with Time-Varying Demand
Researchers have developed a novel online learning algorithm to address the challenge of selecting the optimal Large Language Model (LLM) for diverse user tasks in edge-cloud inference systems. The algorithm is designed to handle time-varying task demands and operate under resource constraints, such as monetary expenditure limits and latency guarantees. By leveraging confidence bounds and demand predictions, the approach aims to maximize rewards while ensuring long-term constraint satisfaction, offering theoretical guarantees for sublinear regret and constraint violations compared to offline methods. AI