It Is Likely That Your Loss Should be a Likelihood
We recall that certain common losses are simplified likelihoods and instead argue for optimizing full likelihoods that include their parameters, such as the variance of the normal distribution and the temperature of the softmax distribution. Joint optimization of likelihood and model parameters can adaptively tune the scales and shapes of losses and the weights of regularizers. We survey and systematically evaluate how to parameterize and apply likelihood parameters for robust modeling and re-calibration. Additionally, we propose adaptively tuning L_2 and L_1 weights by fitting the scale parameters of normal and Laplace priors and introduce more flexible element-wise regularizers.
READ FULL TEXT