PulseAugur
实时 02:39:02
English(EN) Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

研究人员探索用于鲁棒上下文、公平性和未知延迟的对决赌博机

两篇新研究论文探讨了对决赌博机算法的进展,这是一种用于机器学习中偏好数据的方法。第一篇论文解决了易变环境中未知延迟和对抗性腐败等挑战,提出了一种新的算法,其遗憾上限加性地考虑了腐败和延迟。第二篇论文侧重于多用户对决赌博机中的公平性,引入了一个使用纳什社会福利的框架,以确保少数群体不被边缘化,并推导了公平算法的遗憾界限。 AI

影响 这些论文推进了偏好学习的理论理解,有可能提高LLM微调等应用中的公平性和鲁棒性。

排序理由 arXiv上发表的两篇学术论文提出了对决赌博机问题的新算法和理论分析。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

研究人员探索用于鲁棒上下文、公平性和未知延迟的对决赌博机

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Youngmin Oh ·

    Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

    arXiv:2605.01752v1 Announce Type: new Abstract: We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial del…

  2. arXiv cs.LG TIER_1 English(EN) · Maheed H. Ahmed, Mahsa Ghasemi ·

    Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

    arXiv:2605.01961v1 Announce Type: new Abstract: Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the average preference of all human…