2x Faster Language Model Pre-training via Masked Structural Growth

05/04/2023
by   Yiqun Yao, et al.
0

Acceleration of large language model pre-training is a critical issue in present NLP research. In this paper, we focus on speeding up pre-training by progressively growing from a small Transformer structure to a large one. There are two main research problems related to progressive growth: growth schedule and growth operator. For growth schedule, existing work has explored multi-stage expansion of depth and feedforward layers. However, the impact of each dimension on the schedule's efficiency is still an open question. For growth operator, existing work relies on the initialization of new weights to inherit knowledge, and achieve only non-strict function preservation, limiting further optimization of training dynamics. To address these issues, we propose Masked Structural Growth (MSG), including growth schedules involving all possible dimensions and strictly function-preserving growth operators that is independent of the initialization of new weights. Experiments show that MSG is significantly faster than related work: we achieve a speed-up of 80 Bert-base and 120 improve fine-tuning performances at the same time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2022

Staged Training for Transformer Language Models

The current standard approach to scaling transformer language models tra...
research
06/05/2023

Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications

Model pre-training on large text corpora has been demonstrated effective...
research
02/10/2020

REALM: Retrieval-Augmented Language Model Pre-Training

Language model pre-training has been shown to capture a surprising amoun...
research
08/20/2023

FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt

Currently, the construction of large language models in specific domains...
research
11/18/2021

SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table Semantic Parsing

Recently pre-training models have significantly improved the performance...
research
03/28/2022

Automated Progressive Learning for Efficient Training of Vision Transformers

Recent advances in vision Transformers (ViTs) have come with a voracious...

Please sign up or login with your details

Forgot password? Click here to reset