Evaluating the Evaluation of Diversity in Natural Language Generation

04/06/2020
by   Guy Tevet, et al.
0

Despite growing interest in natural language generation (NLG) models that produce diverse outputs, there is currently no principled method for evaluating the diversity of an NLG system. In this work, we propose a framework for evaluating diversity metrics. The framework measures the correlation between a proposed diversity metric and a diversity parameter, a single parameter that controls some aspect of diversity in generated text. For example, a diversity parameter might be a binary variable used to instruct crowdsourcing workers to generate text with either low or high content diversity. We demonstrate the utility of our framework by: (a) establishing best practices for eliciting diversity judgments from humans, (b) showing that humans substantially outperform automatic metrics in estimating content diversity, and (c) demonstrating that existing methods for controlling diversity by tuning a "decoding parameter" mostly affect form but not meaning. Our framework can advance the understanding of different diversity metrics, an essential step on the road towards better NLG systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2022

Distribution Aware Metrics for Conditional Natural Language Generation

Traditional automated metrics for evaluating conditional natural languag...
research
08/17/2020

Evaluating for Diversity in Question Generation over Text

Generating diverse and relevant questions over text is a task with wides...
research
03/19/2020

Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections

Summarizing data samples by quantitative measures has a long history, wi...
research
04/04/2019

Unifying Human and Statistical Evaluation for Natural Language Generation

How can we measure whether a natural language generation system produces...
research
01/15/2021

Directed Diversity: Leveraging Language Embedding Distances for Collective Creativity in Crowd Ideation

Crowdsourcing can collect many diverse ideas by prompting ideators indiv...
research
05/18/2022

GPoeT-2: A GPT-2 Based Poem Generator

This project aims to produce the next volume of machine-generated poetry...
research
11/12/2018

Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation

The use of language models for generating lyrics and poetry has received...

Please sign up or login with your details

Forgot password? Click here to reset