English(EN) Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

研究人员探索用于鲁棒上下文、公平性和未知延迟的对决赌博机

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-05 04:00

两篇新研究论文探讨了对决赌博机算法的进展，这是一种用于机器学习中偏好数据的方法。第一篇论文解决了易变环境中未知延迟和对抗性腐败等挑战，提出了一种新的算法，其遗憾上限加性地考虑了腐败和延迟。第二篇论文侧重于多用户对决赌博机中的公平性，引入了一个使用纳什社会福利的框架，以确保少数群体不被边缘化，并推导了公平算法的遗憾界限。 AI

影响这些论文推进了偏好学习的理论理解，有可能提高LLM微调等应用中的公平性和鲁棒性。

排序理由 arXiv上发表的两篇学术论文提出了对决赌博机问题的新算法和理论分析。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Youngmin Oh · 2026-05-05 04:00

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

arXiv:2605.01752v1 Announce Type: new Abstract: We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial del…
arXiv cs.LG TIER_1 English(EN) · Maheed H. Ahmed, Mahsa Ghasemi · 2026-05-05 04:00

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

arXiv:2605.01961v1 Announce Type: new Abstract: Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the average preference of all human…

报道来源 [2]

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

相关实体

相关话题