HyperMixer: An MLP-based Green AI Alternative to Transformers

by   Florian Mai, et al.

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length and can be difficult to tune. In the pursuit of Green AI, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.


Calibration of Natural Language Understanding Models with Venn–ABERS Predictors

Transformers, currently the state-of-the-art in natural language underst...

The Unstoppable Rise of Computational Linguistics in Deep Learning

In this paper, we trace the history of neural networks applied to natura...

Benchmarking Transformers-based models on French Spoken Language Understanding tasks

In the last five years, the rise of the self-attentional Transformer-bas...

An Analysis of Negation in Natural Language Understanding Corpora

This paper analyzes negation in eight popular corpora spanning six natur...

Sumformer: Universal Approximation for Efficient Transformers

Natural language processing (NLP) made an impressive jump with the intro...

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

State-of-the-art models in natural language processing rely on separate ...

Please sign up or login with your details

Forgot password? Click here to reset