Trajectory Modeling via Random Utility Inverse Reinforcement Learning
We consider the problem of modeling trajectories of drivers in a road network from the perspective of inverse reinforcement learning. As rational agents, drivers are trying to maximize some reward function unknown to an external observer as they make up their trajectories. We apply the concept of random utility from microeconomic theory to model the unknown reward function as a function of observable features plus an error term which represents features known only to the driver. We develop a parameterized generative model for the trajectories based on a random utility Markov decision process formulation of drivers decisions. We show that maximum entropy inverse reinforcement learning is a particular case of our proposed formulation when we assume a Gumbel density function for the unobserved reward error terms. We illustrate Bayesian inference on model parameters through a case study with real trajectory data from a large city obtained from sensors placed on sparsely distributed points on the street network.
READ FULL TEXT