Phase transitions in the mini-batch size for sparse and dense neural networks

05/10/2023
by   Raffaele Marino, et al.
0

The use of mini-batches of data in training artificial neural networks is nowadays very common. Despite its broad usage, theories explaining quantitatively how large or small the optimal mini-batch size should be are missing. This work presents a systematic attempt at understanding the role of the mini-batch size in training two-layer neural networks. Working in the teacher-student scenario, with a sparse teacher, and focusing on tasks of different complexity, we quantify the effects of changing the mini-batch size m. We find that often the generalization performances of the student strongly depend on m and may undergo sharp phase transitions at a critical value m_c, such that for m<m_c the training process fails, while for m>m_c the student learns perfectly or generalizes very well the teacher. Phase transitions are induced by collective phenomena firstly discovered in statistical mechanics and later observed in many fields of science. Finding a phase transition varying the mini-batch size raises several important questions on the role of a hyperparameter which have been somehow overlooked until now.

READ FULL TEXT
research
06/02/2017

Streaming Bayesian inference: theoretical limits and mini-batch approximate message-passing

In statistical learning for real-world large-scale data problems, one mu...
research
11/29/2018

Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs

Large-scale distributed training of deep neural networks suffer from the...
research
07/31/2017

Mini-batch Tempered MCMC

In this paper we propose a general framework of performing MCMC with onl...
research
02/09/2016

Nested Mini-Batch K-Means

A new algorithm is proposed which accelerates the mini-batch k-means alg...
research
02/23/2023

Random Teachers are Good Teachers

In this work, we investigate the implicit regularization induced by teac...
research
08/19/2020

A new role for circuit expansion for learning in neural networks

Many sensory pathways in the brain rely on sparsely active populations o...
research
06/14/2018

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Heuristic tools from statistical physics have been used in the past to l...

Please sign up or login with your details

Forgot password? Click here to reset