Contemporary Symbolic Regression Methods and their Relative Performance

by   William La Cava, et al.

Many promising approaches to symbolic regression have been presented in recent years, yet progress in the field continues to suffer from a lack of uniform, robust, and transparent benchmarking standards. In this paper, we address this shortcoming by introducing an open-source, reproducible benchmarking platform for symbolic regression. We assess 14 symbolic regression methods and 7 machine learning methods on a set of 252 diverse regression problems. Our assessment includes both real-world datasets with no known model form as well as ground-truth benchmark problems, including physics equations and systems of ordinary differential equations. For the real-world datasets, we benchmark the ability of each method to learn models with low error and low complexity relative to state-of-the-art machine learning methods. For the synthetic problems, we assess each method's ability to find exact solutions in the presence of varying levels of noise. Under these controlled experiments, we conclude that the best performing methods for real-world regression combine genetic algorithms with parameter estimation and/or semantic search drivers. When tasked with recovering exact equations in the presence of noise, we find that deep learning and genetic algorithm-based approaches perform similarly. We provide a detailed guide to reproducing this experiment and contributing new methods, and encourage other researchers to collaborate with us on a common and living symbolic regression benchmark.


page 26

page 28


Where are we now? A large benchmark study of recent symbolic regression methods

In this paper we provide a broad benchmarking of recent genetic programm...

Racing Control Variable Genetic Programming for Symbolic Regression

Symbolic regression, as one of the most crucial tasks in AI for science,...

Genetic Programming Based Symbolic Regression for Analytical Solutions to Differential Equations

In this paper, we present a machine learning method for the discovery of...

TMPNN: High-Order Polynomial Regression Based on Taylor Map Factorization

Polynomial regression is widely used and can help to express nonlinear p...

Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition

Symbolic regression searches for analytic expressions that accurately de...

A Reinforcement Learning Approach to Domain-Knowledge Inclusion Using Grammar Guided Symbolic Regression

In recent years, symbolic regression has been of wide interest to provid...

Priors for symbolic regression

When choosing between competing symbolic models for a data set, a human ...

Please sign up or login with your details

Forgot password? Click here to reset