Train Deep Neural Networks in 40-D Subspaces
Although there are massive parameters in deep neural networks, the training can actually proceed in a rather low-dimensional space. By investigating such low-dimensional properties of the training trajectory, we propose a Dynamic Linear Dimensionality Reduction (DLDR), which dramatically reduces the parameter space to a variable subspace of significantly lower dimension. Since there are only a few variables to optimize, second-order methods become applicable. Following this idea, we develop a quasi-Newton-based algorithm to train these variables obtained by DLDR, rather than the original parameters of neural networks. The experimental results strongly support the dimensionality reduction performance: for many standard neural networks, optimizing over only 40 variables, one can achieve comparable performance against the regular training over thousands or even millions of parameters.
READ FULL TEXT