Improving Paraphrase Detection with the Adversarial Paraphrasing Task

06/14/2021
by   Animesh Nighojkar, et al.
0

If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely on a sense of paraphrase based on word overlap and syntax. Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair? We apply the adversarial paradigm to this question, and introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases. These sentence pairs can then be used both to test paraphrase identification models (which get barely random accuracy) and then improve their performance. To accelerate dataset generation, we explore automation of APT using T5, and show that the resulting dataset also improves accuracy. We discuss implications for paraphrase detection and release our dataset in the hope of making paraphrase detection models better able to detect sentence-level meaning equivalence.

READ FULL TEXT
research
10/25/2022

Revision for Concision: A Constrained Paraphrase Generation Task

Academic writing should be concise as concise sentences better keep the ...
research
10/10/2021

What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study

The degree of semantic relatedness (or, closeness in meaning) of two uni...
research
09/11/2018

Assessing Composition in Sentence Vector Representations

An important component of achieving language understanding is mastering ...
research
10/31/2018

SURFACE: Semantically Rich Fact Validation with Explanations

Judging the veracity of a sentence making one or more claims is an impor...
research
12/14/2020

Primer AI's Systems for Acronym Identification and Disambiguation

The prevalence of ambiguous acronyms make scientific documents harder to...
research
06/08/2023

Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS

Existing sentence textual similarity benchmark datasets only use a singl...
research
10/08/2020

PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge

We present a new benchmark dataset called PARADE for paraphrase identifi...

Please sign up or login with your details

Forgot password? Click here to reset