Can neural networks extrapolate? Discussion of a theorem by Pedro Domingos

by   Adrien Courtois, et al.

Neural networks trained on large datasets by minimizing a loss have become the state-of-the-art approach for resolving data science problems, particularly in computer vision, image processing and natural language processing. In spite of their striking results, our theoretical understanding about how neural networks operate is limited. In particular, what are the interpolation capabilities of trained neural networks? In this paper we discuss a theorem of Domingos stating that "every machine learned by continuous gradient descent is approximately a kernel machine". According to Domingos, this fact leads to conclude that all machines trained on data are mere kernel machines. We first extend Domingo's result in the discrete case and to networks with vector-valued output. We then study its relevance and significance on simple examples. We find that in simple cases, the "neural tangent kernel" arising in Domingos' theorem does provide understanding of the networks' predictions. Furthermore, when the task given to the network grows in complexity, the interpolation capability of the network can be effectively explained by Domingos' theorem, and therefore is limited. We illustrate this fact on a classic perception theory problem: recovering a shape from its boundary.


page 11

page 12

page 13

page 17


Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

Deep learning's successes are often attributed to its ability to automat...

Extending the Universal Approximation Theorem for a Broad Class of Hypercomplex-Valued Neural Networks

The universal approximation theorem asserts that a single hidden layer n...

An Exact Kernel Equivalence for Finite Classification Models

We explore the equivalence between neural networks and kernel methods by...

Training Neural Networks Using Reproducing Kernel Space Interpolation and Model Reduction

We introduce and study the theory of training neural networks using inte...

Kernel interpolation generalizes poorly

One of the most interesting problems in the recent renaissance of the st...

A Generalized Representer Theorem for Hilbert Space - Valued Functions

The necessary and sufficient conditions for existence of a generalized r...

Automating the Design and Development of Gradient Descent Trained Expert System Networks

Prior work introduced a gradient descent trained expert system that conc...

Please sign up or login with your details

Forgot password? Click here to reset