TPOT-SH: a Faster Optimization Algorithm to Solve the AutoML Problem on Large Datasets

11/27/2021
by   lpyparmenier, et al.
0

Data are omnipresent nowadays and contain knowl- edge and patterns that machine learning (ML) algorithms can extract so as to take decisions or perform a task without explicit instructions. To achieve that, these algorithms learn a mathematical model using sample data. However, there are numerous ML algorithms, all learning different models of reality. Furthermore, the behavior of these algorithms can be altered by modifying some of their plethora of hyperparameters. Cleverly tuning these algorithms is costly but essential to reach decent performance. Yet it requires a lot of expertise and remains hard even for experts who tend to resort to exploration-only approaches like random search and grid search. The field of AutoML has consequently emerged in the quest for automatized machine learning processes that would be less expensive than brute force searches. In this paper we continue the research initiated on the Tree-based Pipeline Optimization Tool (TPOT), an AutoML based on Evolutionary Algorithms (EA). EAs are typically slow to converge which makes TPOT incapable of scaling to large datasets. As a consequence, we introduce TPOT- SH inspired from the concept of Successive Halving used in Multi- Armed Bandit problems. This solution allows TPOT to explore the search space faster and have much better performance on larger datasets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset