Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning
A large portion of the passenger requests is reportedly unserviced, partially due to vacant for-hire drivers' cruising behavior during the passenger seeking process. This paper aims to model the multi-driver repositioning task through a mean field multi-agent reinforcement learning (MARL) approach. Noticing that the direct application of MARL to the multi-driver system under a given reward mechanism will very likely yield a suboptimal equilibrium due to the selfishness of drivers, this study proposes a reward design scheme with which a more desired equilibrium can be reached. To effectively solve the bilevel optimization problem with upper level as the reward design and the lower level as a multi-agent system (MAS), a Bayesian optimization algorithm is adopted to speed up the learning process. We then use a synthetic dataset to test the proposed model. The results show that the weighted average of order response rate and overall service charge can be improved by 4 service charge, compared with that of no reward design.
READ FULL TEXT