Deep ReLU Networks Have Surprisingly Simple Polytopes

by   Feng-Lei Fan, et al.

A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization of polytopes. To upgrade the characterization to a new level, here we propose to study the shapes of polytopes via the number of simplices obtained by triangulating the polytope. Then, by computing and analyzing the histogram of simplices across polytopes, we find that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes theoretically can be rather diverse and complicated. This finding can be appreciated as a novel implicit bias. Next, we use nontrivial combinatorial derivation to theoretically explain why adding depth does not create a more complicated polytope by bounding the average number of faces of polytopes with a function of the dimensionality. Our results concretely reveal what kind of simple functions a network learns and its space partition property. Also, by characterizing the shape of polytopes, the number of simplices be a leverage for other problems, e.g., serving as a generic functional complexity measure to explain the power of popular shortcut networks such as ResNet and analyzing the impact of different regularization strategies on a network's space partition.


page 7

page 21

page 27

page 30

page 32

page 34

page 35

page 39


On the CVP for the root lattices via folding with deep ReLU neural networks

Point lattices and their decoding via neural networks are considered in ...

On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths

This paper studies the global convergence of gradient descent for deep R...

Gradient Descent Quantizes ReLU Network Features

Deep neural networks are often trained in the over-parametrized regime (...

Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks

The training process of ReLU neural networks often exhibits complicated ...

Implicit Regularization in ReLU Networks with the Square Loss

Understanding the implicit regularization (or implicit bias) of gradient...

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, Gradient Flow Dynamics

Understanding the learning dynamics and inductive bias of neural network...

Please sign up or login with your details

Forgot password? Click here to reset