Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing

by   Josh Alman, et al.

Over the last decade, deep neural networks have transformed our society, and they are already widely applied in various machine learning applications. State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy, and as a result, model training consumes substantial computing resources and will only consume more in the future. Using current training methods, in each iteration, to process a data point x ∈ℝ^d in a layer, we need to spend Θ(md) time to evaluate all the m neurons in the layer. This means processing the entire layer takes Θ(nmd) time for n data points. Recent work [Song, Yang and Zhang, NeurIPS 2021] reduces this time per iteration to o(nmd), but requires exponential time to preprocess either the data or the neural network weights, making it unlikely to have practical usage. In this work, we present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration. Our method requires only O(nmd) time in preprocessing and still achieves o(nmd) time per iteration. We complement our new algorithm with a lower bound, proving that assuming a popular conjecture from complexity theory, one could not substantially speed up our algorithm for dynamic detection of firing neurons.


page 1

page 2

page 3

page 4


Does Preprocessing Help Training Over-parameterized Neural Networks?

Deep neural networks have achieved impressive performance in many areas....

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

It is well known that modern deep neural networks are powerful enough to...

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

We consider the problem of training a multi-layer over-parametrized neur...

Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks

Small neural networks with a constrained number of trainable parameters,...

Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data

Deep learning has achieved impressive success in a variety of fields bec...

Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

Deep learning has been widely used in many fields, but the model trainin...

How does Weight Correlation Affect the Generalisation Ability of Deep Neural Networks

This paper studies the novel concept of weight correlation in deep neura...

Please sign up or login with your details

Forgot password? Click here to reset