Higher Order Linear Transformer

10/28/2020
by   Jean Mercat, et al.
13

Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is re-used and extended to a second-order approximation of the softmax normalization.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset