A Comparison of LSTM and BERT for Small Corpus

by   Aysu Ezen-Can, et al.

Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch. Transformers have made a significant improvement in creating new state-of-the-art results for many NLP tasks including but not limited to text classification, text generation, and sequence labeling. Most of these success stories were based on large datasets. In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models? To answer this question, we use a small dataset for intent classification collected for building chatbots and compare the performance of a simple bidirectional LSTM model with a pre-trained BERT model. Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts. We conclude that the performance of a model is dependent on the task and the data, and therefore before making a model choice, these factors should be taken into consideration instead of directly choosing the most popular model.


page 1

page 2

page 3

page 4


Revisiting Pre-Trained Models for Chinese Natural Language Processing

Bidirectional Encoder Representations from Transformers (BERT) has shown...

A Comparison of SVM against Pre-trained Language Models (PLMs) for Text Classification Tasks

The emergence of pre-trained language models (PLMs) has shown great succ...

Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet

One of the challenges in the NLP field is training large classification ...

BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews

User reviews have an essential role in the success of the developed mobi...

Experiments with adversarial attacks on text genres

Neural models based on pre-trained transformers, such as BERT or XLM-RoB...

On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

In recent years, pre-trained models have become dominant in most natural...

Deploying a BERT-based Query-Title Relevance Classifier in a Production System: a View from the Trenches

The Bidirectional Encoder Representations from Transformers (BERT) model...

Please sign up or login with your details

Forgot password? Click here to reset