Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

10/03/2019
by   Yu Bai, et al.
0

Recent theoretical work has established connections between over-parametrized neural networks and linearized models governed by he Neural Tangent Kernels (NTKs). NTK theory leads to concrete convergence and generalization results, yet the empirical performance of neural networks are observed to exceed their linearized models, suggesting insufficiency of this theory. Towards closing this gap, we investigate the training of over-parametrized neural networks that are beyond the NTK regime yet still governed by the Taylor expansion of the network. We bring forward the idea of randomizing the neural networks, which allows them to escape their NTK and couple with quadratic models. We show that the optimization landscape of randomized two-layer networks are nice and amenable to escaping-saddle algorithms. We prove concrete generalization and expressivity results on these randomized networks, which leads to sample complexity bounds (of learning certain simple functions) that match the NTK and can in addition be better by a dimension factor when mild distributional assumptions are present. We demonstrate that our randomization technique can be generalized systematically beyond the quadratic case, by using it to find networks that are coupled with higher-order terms in their Taylor series.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2022

Quadratic models for understanding neural network dynamics

In this work, we propose using a quadratic model as a tool for understan...
research
12/31/2019

No Spurious Local Minima in Deep Quadratic Networks

Despite their practical success, a theoretical understanding of the loss...
research
05/24/2019

What Can ResNet Learn Efficiently, Going Beyond Kernels?

How can neural networks such as ResNet efficiently learn CIFAR-10 with t...
research
06/12/2021

What can linearized neural networks actually say about generalization?

For certain infinitely-wide neural networks, the neural tangent kernel (...
research
05/09/2023

A duality framework for generalization analysis of random feature models and two-layer neural networks

We consider the problem of learning functions in the ℱ_p,π and Barron sp...
research
05/31/2022

Concrete categories and higher-order recursion (With applications including probability, differentiability, and full abstraction)

We study concrete sheaf models for a call-by-value higher-order language...
research
09/22/2021

Robust Generalization of Quadratic Neural Networks via Function Identification

A key challenge facing deep learning is that neural networks are often n...

Please sign up or login with your details

Forgot password? Click here to reset