PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

02/25/2022
by   Yufei Wang, et al.
0

This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based Data Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2020

DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Data augmentation techniques have been widely used to improve machine le...
research
07/11/2023

RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named Entity Recognition

Data augmentation has been widely used in low-resource NER tasks to tack...
research
05/02/2023

Turning Flowchart into Dialog: Plan-based Data Augmentation for Low-Resource Flowchart-grounded Troubleshooting Dialogs

Flowchart-grounded troubleshooting dialogue (FTD) systems, which follow ...
research
09/09/2023

Distributional Data Augmentation Methods for Low Resource Language

Text augmentation is a technique for constructing synthetic data from an...
research
04/26/2023

Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks

Obtaining and annotating data can be expensive and time-consuming, espec...
research
04/27/2023

ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT

In this paper, we investigate the use of data obtained from prompting a ...

Please sign up or login with your details

Forgot password? Click here to reset