Does Preprocessing Help Training Over-parameterized Neural Networks?

by   Zhao Song, et al.

Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying Ω(mnd) cost for both forward computation and backward computation, where m is the width of the neural network, and we are given n training points in d-dimensional space. In this paper, we propose two novel preprocessing ideas to bypass this Ω(mnd) barrier: ∙ First, by preprocessing the initial weights of the neural networks, we can train the neural network in O(m^1-Θ(1/d) n d) cost per iteration. ∙ Second, by preprocessing the input data points, we can train the neural network in O (m^4/5 nd ) cost per iteration. From the technical perspective, our result is a sophisticated combination of tools in different fields, greedy-type convergence analysis in optimization, sparsity observation in practical work, high-dimensional geometric search in data structure, concentration and anti-concentration in probability. Our results also provide theoretical insights for a large number of previously established fast training methods. In addition, our classical algorithm can be generalized to the Quantum computation model. Interestingly, we can get a similar sublinear cost per iteration but avoid preprocessing initial weights or input data points.


page 1

page 2

page 3

page 4


A Sublinear Adversarial Training Algorithm

Adversarial training is a widely used strategy for making neural network...

Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data

Deep learning has achieved impressive success in a variety of fields bec...

Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing

Over the last decade, deep neural networks have transformed our society,...

Classifying topological sector via machine learning

We employ a machine learning technique for an estimate of the topologica...

Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

In this paper, we study the problem of speeding up a type of optimizatio...

Training Overparametrized Neural Networks in Sublinear Time

The success of deep learning comes at a tremendous computational and ene...

A Study on the Behavior of a Neural Network for Grouping the Data

One of the frequently stated advantages of neural networks is that they ...

Please sign up or login with your details

Forgot password? Click here to reset