Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters

09/20/2023
by   Yukang Xie, et al.
0

Recently, Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks, especially for text generative tasks. Yet, the large size of LLMs often leads to the high computational cost of model training and online deployment. In our work, we present ALTER, a system that effectively builds the multi-tAsk Learners with mixTure-of-task-adaptERs upon small language models (with <1B parameters) to address multiple NLP tasks simultaneously, capturing the commonalities and differences between tasks, in order to support domain-specific applications. Specifically, in ALTER, we propose the Mixture-of-Task-Adapters (MTA) module as an extension to the transformer architecture for the underlying model to capture the intra-task and inter-task knowledge. A two-stage training method is further proposed to optimize the collaboration between adapters at a small computational cost. Experimental results over a mixture of NLP tasks show that our proposed MTA architecture and the two-stage training method achieve good performance. Based on ALTER, we have also produced MTA-equipped language models for various domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2021

GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain

Deep neural language models have set new breakthroughs in many tasks of ...
research
05/23/2022

DistilCamemBERT: a distillation of the French model CamemBERT

Modern Natural Language Processing (NLP) models based on Transformer str...
research
05/21/2023

GPT-3.5 vs GPT-4: Evaluating ChatGPT's Reasoning Performance in Zero-shot Learning

Large Language Models (LLMs) have exhibited remarkable performance on va...
research
05/24/2022

BabyBear: Cheap inference triage for expensive language models

Transformer language models provide superior accuracy over previous mode...
research
03/29/2023

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

Many natural language processing (NLP) tasks rely on labeled data to tra...
research
04/19/2020

The Cost of Training NLP Models: A Concise Overview

We review the cost of training large-scale language models, and the driv...
research
08/01/2022

Few-shot Adaptation Works with UnpredicTable Data

Prior work on language models (LMs) shows that training on a large numbe...

Please sign up or login with your details

Forgot password? Click here to reset