Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 3d · [2 sources]

Policy Regret for Embedding Model Routing: Contextual Bandits with Low-Rank Experts

A new research paper introduces the Hypentropy Policy Gradient (HPG) algorithm for optimizing embedding model routing in recommendation systems. The paper formalizes this problem as an adversarial contextual linear bandit with low-rank experts, addressing challenges like adversarial queries and limited model observability. HPG is designed to adapt to unknown low-rank structures, achieving a policy regret of \tilde{\mathcal O}(s\sqrt{MT}) and offering an efficient, parameter-free implementation. AI

Hugging Face
arXiv
DagsHub
alphaXiv
CORE Recommender
ScienceCast
CatalyzeX
Gotit.pub
IArxiv Recommender
Hypentropy Policy Gradient
Influence Flower
Connected Papers
Litmaps
scite Smart Citations