Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex

by   Hongyang Zhang, et al.

Several recently proposed architectures of neural networks such as ResNeXt, Inception, Xception, SqueezeNet and Wide ResNet are based on the designing idea of having multiple branches and have demonstrated improved performance in many applications. We show that one cause for such success is due to the fact that the multi-branch architecture is less non-convex in terms of duality gap. The duality gap measures the degree of intrinsic non-convexity of an optimization problem: smaller gap in relative value implies lower degree of intrinsic non-convexity. The challenge is to quantitatively measure the duality gap of highly non-convex problems such as deep neural networks. In this work, we provide strong guarantees of this quantity for two classes of network architectures. For the neural networks with arbitrary activation functions, multi-branch architecture and a variant of hinge loss, we show that the duality gap of both population and empirical risks shrinks to zero as the number of branches increases. This result sheds light on better understanding the power of over-parametrization where increasing the network width tends to make the loss surface less non-convex. For the neural networks with linear activation function and ℓ_2 loss, we show that the duality gap of empirical risk is zero. Our two results work for arbitrary depths and adversarial data, while the analytical techniques might be of independent interest to non-convex optimization more broadly. Experiments on both synthetic and real-world datasets validate our results.


Parallel Deep Neural Networks Have Zero Duality Gap

Training deep neural networks is a well-known highly non-convex problem....

The Curious Case of Convex Networks

In this paper, we investigate a constrained formulation of neural networ...

Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Generative Adversarial Networks (GANs) are commonly used for modeling co...

Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys

Neural networks provide a rich class of high-dimensional, non-convex opt...

Global Optimality in Tensor Factorization, Deep Learning, and Beyond

Techniques involving factorization are found in a wide range of applicat...

Risk-Constrained Nonconvex Dynamic Resource Allocation has Zero Duality Gap

We show that risk-constrained dynamic resource allocation problems with ...

Successive Affine Learning for Deep Neural Networks

This paper introduces a successive affine learning (SAL) model for const...

Please sign up or login with your details

Forgot password? Click here to reset