Tailor: Generating and Perturbing Text with Semantic Controls

07/15/2021
by   Alexis Ross, et al.
6

Making controlled perturbations is essential for various tasks (e.g., data augmentation), but building task-specific generators can be expensive. We introduce Tailor, a task-agnostic generation system that perturbs text in a semantically-controlled way. With unlikelihood training, we design Tailor's generator to follow a series of control codes derived from semantic roles. Through modifications of these control codes, Tailor can produce fine-grained perturbations. We implement a set of operations on control codes that can be composed into complex perturbation strategies, and demonstrate their effectiveness in three distinct applications: First, Tailor facilitates the construction of high-quality contrast sets that are lexically diverse, and less biased than original task test data. Second, paired with automated labeling heuristics, Tailor helps improve model generalization through data augmentation: We obtain an average gain of 1.73 on an NLI challenge set by perturbing just 5 Tailor's perturbations effectively improve compositionality in fine-grained style transfer, outperforming fine-tuned baselines on 6 transfers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2022

Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval

This paper investigates an open research problem of generating text-imag...
research
09/02/2022

Random Text Perturbations Work, but not Always

We present three large-scale experiments on binary text matching classif...
research
04/06/2020

Attribute Mix: Semantic Data Augmentation for Fine Grained Recognition

Collecting fine-grained labels usually requires expert-level domain know...
research
05/10/2020

Posterior Control of Blackbox Generation

Text generation often requires high-precision output that obeys task-spe...
research
07/15/2021

StyleFusion: A Generative Model for Disentangling Spatial Segments

We present StyleFusion, a new mapping architecture for StyleGAN, which t...
research
09/27/2019

Automatically Learning Data Augmentation Policies for Dialogue Tasks

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches ...
research
11/02/2022

SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

One of the biggest challenges in designing mispronunciation detection mo...

Please sign up or login with your details

Forgot password? Click here to reset