Implicitly Maximizing Margins with the Hinge Loss
A new loss function is proposed for neural networks on classification tasks which extends the hinge loss by assigning gradients to its critical points. We will show that for a linear classifier on linearly separable data with fixed step size, the margin of this modified hinge loss converges to the ℓ_2 max-margin at the rate of 𝒪( 1/t ). This rate is fast when compared with the 𝒪(1/log t) rate of exponential losses such as the logistic loss. Furthermore, empirical results suggest that this increased convergence speed carries over to ReLU networks.
READ FULL TEXT