Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

by   Shikhar Sharma, et al.

Automated metrics such as BLEU are widely used in the machine translation literature. They have also been used recently in the dialogue community for evaluating dialogue response generation. However, previous work in dialogue response generation has shown that these metrics do not correlate strongly with human judgment in the non task-oriented dialogue setting. Task-oriented dialogue responses are expressed on narrower domains and exhibit lower diversity. It is thus reasonable to think that these automated metrics would correlate well with human judgment in the task-oriented setting where the generation task consists of translating dialogue acts into a sentence. We conduct an empirical study to confirm whether this is the case. Our findings indicate that these automated metrics have stronger correlation with human judgments in the task-oriented setting compared to what has been observed in the non task-oriented setting. We also observe that these metrics correlate even better for datasets which provide multiple ground truth reference sentences. In addition, we show that some of the currently available corpora for task-oriented language generation can be solved with simple models and advocate for more challenging datasets.


page 1

page 2

page 3

page 4


How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

We investigate evaluation metrics for dialogue response generation syste...

Dataflow Dialogue Generation

We demonstrate task-oriented dialogue generation within the dataflow dia...

Exploring the Impact of Human Evaluator Group on Chat-Oriented Dialogue Evaluation

Human evaluation has been widely accepted as the standard for evaluating...

On the Use of Linguistic Features for the Evaluation of Generative Dialogue Systems

Automatically evaluating text-based, non-task-oriented dialogue systems ...

A Template-guided Hybrid Pointer Network for Knowledge-basedTask-oriented Dialogue Systems

Most existing neural network based task-oriented dialogue systems follow...

Establishing linguistic conventions in task-oriented primeval dialogue

In this paper, we claim that language is likely to have emerged as a mec...

Efficient Task-Oriented Dialogue Systems with Response Selection as an Auxiliary Task

The adoption of pre-trained language models in task-oriented dialogue sy...

Please sign up or login with your details

Forgot password? Click here to reset