A sparse identification approach for automating choice models' specification
The methodology discussed in this paper aims to enhance choice models' comprehensiveness and explanatory power for forecasting choice outcomes. To achieve these, we have developed a data-driven method that leverages machine learning procedures for identifying the most effective representation of variables in mode choice empirical probability specifications. The methodology will show its significance, particularly in the face of big data and an abundance of variables where it can search through many candidate models. Furthermore, this study will have potential applications in transportation planning and policy-making, which will be achieved by introducing a sparse identification method that looks for the sparsest specification ( parsimonious model ) in the domain of candidate functions. Finally, this paper applies the method to synthetic choice data as a proof of concept. We perform two experiments and show that if the functional form used to generate the synthetic data lies in the domain of base functions, the methodology can recover that. Otherwise, the method will raise a red flag by outputting small coefficients ( near zero ) for base functions.
READ FULL TEXT