Annotation Inconsistency and Entity Bias in MultiWOZ

by   Kun Qian, et al.

MultiWOZ is one of the most popular multi-domain task-oriented dialog datasets, containing 10K+ annotated dialogs covering eight domains. It has been widely accepted as a benchmark for various dialog tasks, e.g., dialog state tracking (DST), natural language generation (NLG), and end-to-end (E2E) dialog modeling. In this work, we identify an overlooked issue with dialog state annotation inconsistencies in the dataset, where a slot type is tagged inconsistently across similar dialogs leading to confusion for DST modeling. We propose an automated correction for this issue, which is present in a whopping 70 bias in the dataset (e.g., "cambridge" appears in 50 in the train domain). The entity bias can potentially lead to named entity memorization in generative models, which may go unnoticed as the test set suffers from a similar entity bias as well. We release a new test set with all entities replaced with unseen entities. Finally, we benchmark joint goal accuracy (JGA) of the state-of-the-art DST baselines on these modified versions of the data. Our experiments show that the annotation inconsistency corrections lead to 7-10 JGA when models are evaluated on the new test set with unseen entities.


page 1

page 2

page 3

page 4


CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation

Practical dialog systems need to deal with various knowledge sources, no...

How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation

We release MMSMR, a Massively Multi-System MultiReference dataset to ena...

Named Entities troubling your Neural Methods? Build NE-Table: A neural approach for handling Named Entities

Many natural language processing tasks require dealing with Named Entiti...

Zero-shot Generalization in Dialog State Tracking through Generative Question Answering

Dialog State Tracking (DST), an integral part of modern dialog systems, ...

GE-Blender: Graph-Based Knowledge Enhancement for Blender

Although the great success of open-domain dialogue generation, unseen en...

MA-DST: Multi-Attention Based Scalable Dialog State Tracking

Task oriented dialog agents provide a natural language interface for use...

Error-correction and extraction in request dialogs

We propose a component that gets a request and a correction and outputs ...

Please sign up or login with your details

Forgot password? Click here to reset