This paper investigates the multi-secretary problem, focusing on additive regret which measures the difference between optimal offline rewards and online policy rewards. Researchers have established logarithmic regret bounds for certain distributions and quadratic bounds for others. The study proves that a quadratic lower bound is necessary for mixtures of two separated uniform distributions, indicating that existing upper bounds for gapped distributions are tight. The proofs utilize Bellman certificates, which help construct explicit certificates and explain why support gaps lead to larger regret. AI
RANK_REASON Academic paper published on arXiv detailing theoretical computer science research. [lever_c_demoted from research: ic=1 ai=0.1]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →