Exploiting Redundancy in Pre-trained Language Models for Efficient Transfer Learning

04/08/2020
by   Fahim Dalvi, et al.
0

Large pre-trained contextual word representations have transformed the field of natural language processing, obtaining impressive results on a wide range of tasks. However, as models increase in size, computational limitations make them impractical for researchers and practitioners alike. We hypothesize that contextual representations have both intrinsic and task-specific redundancies. We propose a novel feature selection method, which takes advantage of these redundancies to reduce the size of the pre-trained features. In a comprehensive evaluation on two pre-trained models, BERT and XLNet, using a diverse suite of sequence labeling and sequence classification tasks, our method reduces the feature set down to 1–7 of the performance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset