Adaptable Adapters

by   Nafise Sadat Moosavi, et al.

State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture – i.e., the same adapter layer on top of each layer of the pretrained model – for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably smaller number of adapter layers. In addition, we show that the selected adapter architecture by adaptable adapters transfers well across different data settings and similar tasks. We propose to use adaptable adapters for designing efficient and effective adapter architectures. The resulting adapters (a) contain about 50 of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.


page 1

page 2

page 3

page 4


Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Adapting large-scale pretrained language models to downstream tasks via ...

SubTuning: Efficient Finetuning for Multi-Task Learning

Finetuning a pretrained model has become a standard approach for trainin...

Transformers with Learnable Activation Functions

Activation functions can have a significant impact on reducing the topol...

Learning Features with Parameter-Free Layers

Trainable layers such as convolutional building blocks are the standard ...

Learning Neural Network Architectures using Backpropagation

Deep neural networks with millions of parameters are at the heart of man...

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

Language Models pretrained on large textual data have been shown to enco...

Please sign up or login with your details

Forgot password? Click here to reset