Fast Feedforward Networks

08/28/2023
by   Peter Belcak, et al.
0

We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a log-time alternative to feedforward networks. We demonstrate that FFFs are up to 220x faster than feedforward networks, up to 6x faster than mixture-of-experts networks, and exhibit better training properties than mixtures of experts thanks to noiseless conditional execution. Pushing FFFs to the limit, we show that they can use as little as 1 preserving 94.2

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset