Overfitting Mechanism and Avoidance in Deep Neural Networks

by   Shaeke Salman, et al.

Assisted by the availability of data and high performance computing, deep learning techniques have achieved breakthroughs and surpassed human performance empirically in difficult tasks, including object recognition, speech recognition, and natural language processing. As they are being used in critical applications, understanding underlying mechanisms for their successes and limitations is imperative. In this paper, we show that overfitting, one of the fundamental issues in deep neural networks, is due to continuous gradient updating and scale sensitiveness of cross entropy loss. By separating samples into correctly and incorrectly classified ones, we show that they behave very differently, where the loss decreases in the correct ones and increases in the incorrect ones. Furthermore, by analyzing dynamics during training, we propose a consensus-based classification algorithm that enables us to avoid overfitting and significantly improve the classification accuracy especially when the number of training samples is limited. As each trained neural network depends on extrinsic factors such as initial values as well as training data, requiring consensus among multiple models reduces extrinsic factors substantially; for statistically independent models, the reduction is exponential. Compared to ensemble algorithms, the proposed algorithm avoids overgeneralization by not classifying ambiguous inputs. Systematic experimental results demonstrate the effectiveness of the proposed algorithm. For example, using only 1000 training samples from MNIST dataset, the proposed algorithm achieves 95 significantly higher than any of the individual models, with 90 samples classified.


Interpretable Deep Neural Networks for Patient Mortality Prediction: A Consensus-based Approach

Deep neural networks have achieved remarkable success in challenging tas...

Simultaneous Classification and Novelty Detection Using Deep Neural Networks

Deep neural networks have achieved great success in classification tasks...

Skeptical Deep Learning with Distribution Correction

Recently deep neural networks have been successfully used for various cl...

Reducing Flipping Errors in Deep Neural Networks

Deep neural networks (DNNs) have been widely applied in various domains ...

IRX-1D: A Simple Deep Learning Architecture for Remote Sensing Classifications

We proposes a simple deep learning architecture combining elements of In...

Neural Networks Regularization Through Representation Learning

Neural network models and deep models are one of the leading and state o...

Enlightening Deep Neural Networks with Knowledge of Confounding Factors

Deep learning techniques have demonstrated significant capacity in model...

Please sign up or login with your details

Forgot password? Click here to reset