Recent works have explored the use of weight sparsity to improve the tra...
The pre-training and fine-tuning paradigm has contributed to a number of...
Recent works have shown that deep neural networks benefit from multi-tas...
Recently many first and second order variants of SGD have been proposed ...
We propose dynamic curriculum learning via data parameters for noise rob...
One-hot labels do not represent soft decision boundaries among concepts,...
Localizing natural language phrases in images is a challenging problem t...
Despite the success of CNNs, selecting the optimal architecture for a gi...