Mnemosyne: Learning to Train Transformers with Transformers

02/02/2023
by   Deepali Jain, et al.
0

Training complex machine learning (ML) architectures requires a compute and time consuming process of selecting the right optimizer and tuning its hyper-parameters. A new paradigm of learning optimizers from data has emerged as a better alternative to hand-designed ML optimizers. We propose Mnemosyne optimizer, that uses Performers: implicit low-rank attention Transformers. It can learn to train entire neural network architectures including other Transformers without any task-specific optimizer tuning. We show that Mnemosyne: (a) generalizes better than popular LSTM optimizer, (b) in particular can successfully train Vision Transformers (ViTs) while meta–trained on standard MLPs and (c) can initialize optimizers for faster convergence in Robotics applications. We believe that these results open the possibility of using Transformers to build foundational optimization models that can address the challenges of regular Transformer training. We complement our results with an extensive theoretical analysis of the compact associative memory used by Mnemosyne.

READ FULL TEXT
research
07/19/2022

Formal Algorithms for Transformers

This document aims to be a self-contained, mathematically precise overvi...
research
11/17/2022

VeLO: Training Versatile Learned Optimizers by Scaling Up

While deep learning models have replaced hand-designed features across m...
research
09/16/2022

Quantum Vision Transformers

We design and analyse quantum transformers, extending the state-of-the-a...
research
09/18/2023

Deep Prompt Tuning for Graph Transformers

Graph transformers have gained popularity in various graph-based tasks b...
research
11/06/2021

Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

With the recent developments in the field of Natural Language Processing...
research
08/18/2023

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

The growing popularity of Vision Transformers as the go-to models for im...
research
03/14/2017

Learned Optimizers that Scale and Generalize

Learning to learn has emerged as an important direction for achieving ar...

Please sign up or login with your details

Forgot password? Click here to reset