English(EN) Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

DoorDash 使用强化学习通过延迟反馈自适应配送调度

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-11 17:21

研究人员为 DoorDash 开发了一个多智能体强化学习系统，该系统使用延迟的市场反馈来调整调度目标权重。该系统在门店层面部署，选择乘数来调整配送质量和批量效率之间的权衡。这种方法允许使用嘈杂和延迟的信号进行离线策略学习，同时保留操作安全措施。一项生产实验表明，该策略在不影响配送质量的情况下，提高了批量效率并降低了配送员时间成本。 AI

影响展示了强化学习如何优化具有延迟反馈的复杂物流系统，有可能提高其他配送平台的效率。

排序理由该集群包含一篇详细介绍已部署的真实应用系统的学术论文。

在 arXiv cs.MA (Multiagent) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Haochen Wu, Yi Hou, Shiguang Xie · 2026-06-12 04:00

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

arXiv:2606.13604v1 Announce Type: new Abstract: Dispatch in three-sided marketplaces provides a natural setting for reinforcement learning from world feedback: decisions are evaluated by delayed operational outcomes such as delivery speed, courier utilization, and merchant conges…
arXiv cs.MA (Multiagent) TIER_1 English(EN) · Shiguang Xie · 2026-06-11 17:21

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

Dispatch in three-sided marketplaces provides a natural setting for reinforcement learning from world feedback: decisions are evaluated by delayed operational outcomes such as delivery speed, courier utilization, and merchant congestion. We present a deployed reinforcement learni…

报道来源 [2]

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

相关实体

相关话题