PulseAugur
实时 06:52:29
English(EN) Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

DoorDash 使用强化学习通过延迟反馈自适应配送调度

研究人员为 DoorDash 开发了一个多智能体强化学习系统,该系统使用延迟的市场反馈来调整调度目标权重。该系统在门店层面部署,选择乘数来调整配送质量和批量效率之间的权衡。这种方法允许使用嘈杂和延迟的信号进行离线策略学习,同时保留操作安全措施。一项生产实验表明,该策略在不影响配送质量的情况下,提高了批量效率并降低了配送员时间成本。 AI

影响 展示了强化学习如何优化具有延迟反馈的复杂物流系统,有可能提高其他配送平台的效率。

排序理由 该集群包含一篇详细介绍已部署的真实应用系统的学术论文。

在 arXiv cs.MA (Multiagent) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Haochen Wu, Yi Hou, Shiguang Xie ·

    Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

    arXiv:2606.13604v1 Announce Type: new Abstract: Dispatch in three-sided marketplaces provides a natural setting for reinforcement learning from world feedback: decisions are evaluated by delayed operational outcomes such as delivery speed, courier utilization, and merchant conges…

  2. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Shiguang Xie ·

    Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

    Dispatch in three-sided marketplaces provides a natural setting for reinforcement learning from world feedback: decisions are evaluated by delayed operational outcomes such as delivery speed, courier utilization, and merchant congestion. We present a deployed reinforcement learni…