Distilling BERT for low complexity network training

05/13/2021
by   Bansidhar Mangalwedhekar, et al.
0

This paper studies the efficiency of transferring BERT learnings to low complexity models like BiLSTM, BiLSTM with attention and shallow CNNs using sentiment analysis on SST-2 dataset. It also compares the complexity of inference of the BERT model with these lower complexity models and underlines the importance of these techniques in enabling high performance NLP models on edge devices like mobiles, tablets and MCU development boards like Raspberry Pi etc. and enabling exciting new applications.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset