Learning Curve Theory

02/08/2021
by   Marcus Hutter, et al.
7

Recently a number of empirical "universal" scaling law papers have been published, most notably by OpenAI. `Scaling laws' refers to power-law decreases of training or test error w.r.t. more data, larger neural networks, and/or more compute. In this work we focus on scaling w.r.t. data size n. Theoretical understanding of this phenomenon is largely lacking, except in finite-dimensional models for which error typically decreases with n^-1/2 or n^-1, where n is the sample size. We develop and theoretically analyse the simplest possible (toy) model that can exhibit n^-β learning curves for arbitrary power β>0, and determine whether power laws are universal or depend on the data distribution.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro