k-decay: A New Method For Learning Rate Schedule
It is well known that the learning rate is the most important hyper-parameter on Deep Learning. Usually used learning rate schedule training neural networks. This paper puts forward a new method for learning rate schedule, named k-decay, which suitable for any derivable function to derived a new schedule function. On the new function control degree of decay by the new hyper-parameter k, while the original function is the special case at k = 1. This paper applied k-decay to polynomial function, cosine function and exponential function gives them the new function. In the paper, evaluate the k-decay method by the new polynomial function on CIFAR-10 and CIFAR-100 datasets with different neural networks (ResNet, Wide ResNet and DenseNet), the results improvements over the state-of-the-art results on most of them. Our experiments show that the performance of the model improves with the increase of k from 1.
READ FULL TEXT