Parallel Dither and Dropout for Regularising Deep Neural Networks

08/28/2015
by   Andrew J. R. Simpson, et al.
0

Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be used without batch averaging. Our results for parallel-regularised non-batch-SGD are substantially better than what is possible with batch-SGD. Furthermore, our results demonstrate that dither and dropout are complimentary.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset