Approximate bi-level optimization (ABLO) consists of (outer-level)
optim...
The Transformer architecture has revolutionized deep learning on sequent...
We introduce Performers, Transformer architectures which can estimate re...
Bilevel optimization (BLO) is a popular approach with many applications
...
Transformer models have achieved state-of-the-art results across a diver...
In this paper we propose a new approach for optimization over orthogonal...
We present a new class of stochastic, geometrically-driven optimization
...