PulseAugur
EN
LIVE 05:46:52

Apple tasting problem regret scales with square root of time

Researchers have analyzed the two-action apple-tasting problem, a scenario where a learner chooses between revealing information or taking a blind action, incurring costs for switching between them. They established that the oblivious minimax expected regret for this problem scales with the square root of time, specifically between $\frac{1}{2\sqrt3}\sqrt T$ and $2\sqrt3\sqrt T$. This finding resolves a long-standing question regarding the classification of feedback graphs in this context, demonstrating that a previously suspected $\Omega(T^{2/3})$ obstruction does not exist. AI

IMPACT Refines theoretical understanding of decision-making under uncertainty with switching costs, potentially informing agent design.

RANK_REASON Academic paper published on arXiv detailing a theoretical problem in machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Tommaso Cesari, Roberto Colomboni ·

    Two-Action Apple Tasting with Switching Costs

    arXiv:2606.03851v1 Announce Type: new Abstract: We study the two-action apple-tasting problem with switching costs against an oblivious adversary. In an equivalent normalized formulation, at each round the learner chooses between a revealing action and a blind action: the reveali…

  2. arXiv cs.LG TIER_1 English(EN) · Roberto Colomboni ·

    Two-Action Apple Tasting with Switching Costs

    We study the two-action apple-tasting problem with switching costs against an oblivious adversary. In an equivalent normalized formulation, at each round the learner chooses between a revealing action and a blind action: the revealing action gives reward $0$ and reveals the hidde…