Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

03/28/2020
by   Amit Daniely, et al.
0

We prove that a single step of gradient decent over depth two network, with q hidden neurons, starting from orthogonal initialization, can memorize Ω(dq/log^4(d)) independent and randomly labeled Gaussians in R^d. The result is valid for a large class of activation functions, which includes the absolute value.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro